OPNsense skipping gateway if it shares VLAN on another interface of source

Started by Sebithus, January 28, 2025, 12:54:35 AM

Previous topic - Next topic
January 28, 2025, 12:54:35 AM Last Edit: January 28, 2025, 08:13:05 PM by Sebithus Reason: Clarity in text and images added
Here, OPNsense is set up for experimentation purposes. It is currently not being used to manage the network.


Say I have VLAN 1 and 2 on OPNsense with IPs 1.1/24 and 2.1/24, respectively.

If I have a PC on VLAN 1 trying to reach 2.1--it does so via its (my PC's) gateway--OPNsense answers with its 2.1 interface, yet it does so on VLAN 1--skipping its (OPNsense's) gateway altogether. I can tell by Wireshark output telling me its source MAC address and IP address--it's the 2.1 IP, but it's skipping the gateway router by answering directly on VLAN 1 as seen by my PC's Wireshark picking up its MAC instead of my current router's MAC.

Ping from my PC works fine doing this, but my PC's browser doesn't seem to like it when I try to access the WebGUI.



My questions are:

Is there a way to control for this? I want OPNsense to answer via the VLAN that it received the connection from.
How is this so?
    Shouldn't it be answering on VLAN 2 since the IP address its answering from is assigned to it?
    Is this a feature of FreeBSD or would this happen on Linux too?


Edited for clarity and images -- 2025 01 28 13:10 CST

OPNsense Interface Overview


PC is on eth1, PXE/OPNsense is on eth4


PXE Network Setup and OPNsense VM


Wireshark Proofs


VLAN is layer 2, subnet is layer 3. Did you assign different subnets to each VLAN? If so, then OPNsense will only have one route to each host.

OP, if you capture some traffic between your PC and the internet (capture it on the LAN interface), you'll see that the ethernet frames always have the MAC address of your gateway (on the same layer 3 segment), not that of the host somewhere on the internet. Time to re-read about the OSI model? :)

Quote from: bartjsmit on January 28, 2025, 08:04:29 AMVLAN is layer 2, subnet is layer 3. Did you assign different subnets to each VLAN? If so, then OPNsense will only have one route to each host.

Yes. Please take a look.


Quote from: dseven on January 28, 2025, 09:41:26 AMOP, if you capture some traffic between your PC and the internet (capture it on the LAN interface), you'll see that the ethernet frames always have the MAC address of your gateway (on the same layer 3 segment), not that of the host somewhere on the internet. Time to re-read about the OSI model? :)

I should have clarified that my PC was using its gateway; I did say that OPNsense wasn't. You'll notice that I said that they're on different VLANs and networks.

Here are the Wireshark proofs.




I don't know if it's just me, but this setup is making my head spin.

So you seem to have a PVE host with a single NIC, connected to eth4 of some switch.
PVE Management IP is 10.10.0.2 in VLAN 100 with GW 10.10.0.1

There does not seem to be a way out of the switch for that VLAN.
The OPNsense VM is getting the entire vmbr0 bridge to the single NIC.
There's a vlan100(opt1) interface for that VLAN 100, yet its static IP is 10.10.0.3 (GW is 10.10.0.1).

Where's 10.10.0.1??
With the default route on that interface (Did you get rid of WAN?)! Acting as LAN?

There's a VLAN 10 (you could make your/our lives easier by following some convention for VLAN to subnet mapping) that maybe makes more sense.
eth1 untagged to a management PC.
eth2 untagged to a GW for that subnet? Ubiquiti device? 172.27.201.1, right?
eth4 tagged connected to PVE/OPN
There's an Admin(opt2) interface for that VLAN in OPN. Acting as WAN?
The device name is vlan01, not vtnet0_vlan10 to be consistent with the others? Is it set up the same way?

And then you're trying to open the GUI from a machine on VLAN 10 by connecting to the IP of VLAN 100??
I give up...

Quote from: EricPerl on January 28, 2025, 10:49:19 PMI don't know if it's just me, but this setup is making my head spin.

So you seem to have a PVE host with a single NIC, connected to eth4 of some switch.
PVE Management IP is 10.10.0.2 in VLAN 100 with GW 10.10.0.1

There does not seem to be a way out of the switch for that VLAN.
The OPNsense VM is getting the entire vmbr0 bridge to the single NIC.
There's a vlan100(opt1) interface for that VLAN 100, yet its static IP is 10.10.0.3 (GW is 10.10.0.1).

Where's 10.10.0.1??
With the default route on that interface (Did you get rid of WAN?)! Acting as LAN?

There's a VLAN 10 (you could make your/our lives easier by following some convention for VLAN to subnet mapping) that maybe makes more sense.
eth1 untagged to a management PC.
eth2 untagged to a GW for that subnet? Ubiquiti device? 172.27.201.1, right?
eth4 tagged connected to PVE/OPN
There's an Admin(opt2) interface for that VLAN in OPN. Acting as WAN?
The device name is vlan01, not vtnet0_vlan10 to be consistent with the others? Is it set up the same way?

And then you're trying to open the GUI from a machine on VLAN 10 by connecting to the IP of VLAN 100??
I give up...

I'm using an Ubiquiti ER-X as my main router and it's also a managed switch.

OPNsense is for testing right now. It doesn't have direct access to the Internet/WAN.

It happens that VLAN10 was used for management for a WAP and the ERX.

There was a Dell PowerEdge with iDRAC hooked up to eth4 on the ER-X which used VLAN 100 (PowerEdge) and 101 (iDRAC). I left those VLANs there on eth4 despite disconnecting those from that port and putting in another PC that is now hosting PXE and a OPNsense VM.

Initially, OPNsense used VLAN 100 untagged as I had the ERX set its eth4 as VLAN100 untagged. For labbing and convention, I put VLAN10 tagged, the admin VLAN, in on eth4 and gave OPNsense an interface. The fun began when I then immediately lost access to the VLAN100 interface of OPNsense the moment that I did saved the setting telling to pick up an IP via DHCP (after having saved and set up the interface to be on VLAN10).

The naming you're seeing on OPNsense is a mess that I left after rapidly testing different options such as deleting and adding back the VLAN10 interface.

The ERX includes VLANS 10-50 that use 172.27.xxx.0  going from 101 to 105 /24. The gateway for each is run by ERX on the first host of each subnet (i.e. .1). You'll notice that the Ubiquiti part in the MAC of the Wireguard screenshots is what my PC used as its gateway--noted by the destination of its frames.

As an aside, the unconventional network numbering is deliberate for use in routing over Wireguard if I'm using someone else's network or for other possible integrations to proceed without IP number conflicts. When I first set up my network, I started with 192.168.1.0/24, but when I added Wireguard to the ERX, that would've caused routing issues if I was visiting a cafe or a friend and I tried to access my network remotely from those places.

Story time: I started with VLAN100 & 101 because it was already on that port from another server. Since I generally use VLAN10 for admin access, I decided to put OPNsense to have an interface there. Well, after added an interface to VLAN 10 and immediately after I told that interface to pick up an IP via DHCP, I lost access to the 10.10.0.3 WebGUI that I was using to set it up, and at first, I had lost access (nearly) entirely because I hadn't set up the firewall rules for that new VLAN10 interface thinking that I'd have the open WebGUI on 10.10.0.3 that I was already using. I reverted via PXE console, added the pass rules for OPNsense interface on VLAN10, yet I still lost the use of 10.10.0.3 via management PC (that itself is on VLAN10)--weird. Though, I found that the WebGUI worked OPNsense's VLAN10 IP. I turned off the firewall on OPNsense entirely, and I still couldn't get in to the OPNsense interface on VLAN100. I used a PC that's on VLAN20, and all of its IPs worked, but VLAN10 PC couldn't access any interface of OPNsense except for the one that it shared with OPNsense as long as they both shared it which I found out by deleting and adding the VLAN10 interface back on OPNsense. I put VLAN101 up on OPNsense, but it only worked as VLAN100 did when I deleted OPNsense's VLAN10 interface, yet for the PC on VLAN20 they all worked still. Odd, right? Then, I pull up the Wireshark readings on my PC on VLAN10, and I see that OPNsense was skipping its gateway and answering my PCs MAC address on VLAN10 with OPNsense's IP on VLAN100 after it gets an IP on VLAN10.

Via PXE, I tried putting VLAN10 as an individual interface on OPNsense (vnet1), but no... if OPNsense had an interface, but really, the moment that it got an IP on VLAN10, my PC on that VLAN would start getting responses directly from OPNsense's other network IPs that are on other VLANs that it messaged instead on the VLAN10 that it shares with the management PC.

Now, VLAN100 is a different VLAN with a different network that both PCs know is different network because of the network portion of their IPs as described by their subnet mask, but OPNsense is still using an IP set to VLAN100 to answer a message, yet it does so directly on VLAN10 without using its gateway on VLAN100 because.... who knows? It even does so if I delete the VLAN10 IP on OPNsense but otherwise don't delete that interface. If I delete the VLAN10 interface, I can resume contact to OPNsense's other IPs via the management PC on VLAN10, and then OPNsense goes back to using its respective gateways that the ERX is hosting.



More directly to your questions:
The ER-X has an interface on VLAN100 that is a gateway that is IP 10.10.0.1

OPNsense is another host on the network.
It isn't directly connected to the Internet; the ER-X has WAN on eth0.

As an aside: eth0 isn't on its DSA, and it isn't because the ER-X has an ASIC for switching and NAT (hwoffload), and I want its CPU to process things differently for WAN<>LAN. It still enforces firewall rules between VLANs, but I feel safer having eth0 off of the DSA in order to force traffic that crosses that boundary for CPU processing which it then picks up for DPI and logging--which it doesn't do so well if I put it on the DSA and have the hwoffload pick it up--which I rely on the hwoffload a good bit within the LAN because ERX is not powerful enough reach 1gbps within LAN without it. I'm testing OPNsense for study/fun, but also in prep to transition to router-on-a-stick with it.

A WAP connects to eth2. eth3 is a VOIP adapter--that's why it's just 100Mb.

172.27.201.1 is an ER-X gateway for VLAN10.
172.27.20[1-5].1 are also ER-X interfaces for other VLANs 10-50; also 10.10.0.1 and 10.10.1.1 for VLAN 100/101.

The OPNsense naming convention got messed up when I deleted and re-added the VLAN10 interface. I left the auto-naming alone, yet it wouldn't automatically name it vnet0_vlan10 as it had before I had tried adding and deleting the vnet1 interface that I set PXE to only carry VLAN10. Good catch. I noticed that quirk too after I had remade the interface back in OPNsense, but the naming appeared irrelevant to me--especially after I got desperate and focused on rapid testing by seeing how OPNsense's other interfaces would or would not work for the PC on VLAN10.




Yes, despite the unconventional and experimental setup, I want to know why it is that when OPNsense and my PC have interfaces that have IPs on the same network and VLAN and my PC tries to connect to OPNsense's WebGUI that's on a different interface, network, and VLAN that OPNsense answers with the IP of that other network/vlan yet with frames that directly use my PC as its destination and it does so directly on that shared VLAN--even though the IP that my PC was trying to connect to was routed by a gateway to connect those VLANs.

Now, I'm thinking that maybe it's a quirk of the PXE bridge, but OPNsense ought to know via its IP and subnet mask that the destination IP (my PC) is on a different network and encapsulate those packets in frames destined toward its gateway (via 10.10.0.1), but it's not doing that. Even if they were all in the same VLAN/broadcast zone anyway, it should STILL be sending frames to its gateway regardless. Yet, here, it's skipping the gateway AND VLAN at the same time and just directly messaging the PC on both layer 2 (instead the MAC ID for ERX on 10.10.0.1) and 802.1q (VLAN10 instead of VLAN100).

Interestingly, I learned that ping works! Ping, the executable, and I suppose the whole OS stack that leads up to it, doesn't care! It takes an echo reply direct from PXE's MAC despite my PC having used the ERX gateway for its frame destination. Firefox, however, sure does care! Very interesting, I think.


Anyway, that's why I simplified my initial post. Please, let me know if you have any more questions.


Quote from: Sebithus on January 29, 2025, 06:04:24 AMthat OPNsense answers with the IP of that other network/vlan yet with frames that directly use my PC as its destination

Because that is how IP routing works. If the destination address (your PC) is on a directly connected network, the packets will be sent out that interface. Every system does this. Routing is based on destination address only, the source address, in that case the one on OPNsense in another network, is irrelevant for routing decisions.

Unless one employs policy routing, but that is not the default.

What's the point of having multiple connections if you do not use them? Use the OPNsense address in that directly connected network to connect to the UI, perhaps?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

So the bridge device shown in the OP is the LAN part of an ERX (which has WAN on eth0?).
And the ERX has an interface for all the VLANs mentioned above and is the GW for all of them.

What kind of routing is expected to happen in OPN when all the VLANs are handled in the ERX?
What are you testing?

Quote from: Patrick M. Hausen on January 29, 2025, 09:25:35 AM
Quote from: Sebithus on January 29, 2025, 06:04:24 AMthat OPNsense answers with the IP of that other network/vlan yet with frames that directly use my PC as its destination

Because that is how IP routing works. If the destination address (your PC) is on a directly connected network, the packets will be sent out that interface. Every system does this. Routing is based on destination address only, the source address, in that case the one on OPNsense in another network, is irrelevant for routing decisions.

Unless one employs policy routing, but that is not the default.

What's the point of having multiple connections if you do not use them? Use the OPNsense address in that directly connected network to connect to the UI, perhaps?

Oh, I thought that interfaces check against their own IP/subnet mask and the destination IP to evaluate if its on the same network and then make a decision about whether to forward a frame out to its gateway or to forward the frame directly host of the destination IP; is that not the case? I feel as though you were clear, but I'd like the confirmation.

It seems as though there's also a check to see the source host has an IP/mask that is also on the same network as its destination host, and if it does, then uses the interface of the directly connected network, but with the IP/mask of the other interface. Basically, I'm learning that all interfaces share their IPs if the source host recognizes that it is directly connected at layer 2 with the destination host--in this case, it still uses the IP of another interface because it's the one that had received packets and is thus responding from.

I'll look into policy routing, and see if I can use it to get access to all of the interfaces via Firefox. I had done some experimentation with this, but I'll press more into it. Thank you for the direction.

I understood that I could just go set things up without this complication by using the IP of the directly connected interface, but when I lost access to the other interface IPs of OPNsense, I was confused and took it upon myself to make this a learning opportunity.

Besides learning, the point is that, from my limited understanding, I should be able to access any of the interface IPs so long as there's no firewall rule against it (and the routing is set up). I think that I should not be losing access to all of the other IPs just because one of them is directly connected on the same layer 2 segment.

By the way, I've seem your avatar in plenty of posts before this OP when I was doing searching, so I recognize your expertise; so I hope that you recognize that I appreciate especially your efforts to teach (or point towards) the finer points of these fundamentals despite the practicality of this particular application of these tools.

Quote from: EricPerl on January 29, 2025, 07:41:23 PMSo the bridge device shown in the OP is the LAN part of an ERX (which has WAN on eth0?).
And the ERX has an interface for all the VLANs mentioned above and is the GW for all of them.

What kind of routing is expected to happen in OPN when all the VLANs are handled in the ERX?
What are you testing?



Yes, and eth0 isn't shown because it's not included the DSA of the ERX.

At the moment, I'm not expecting OPN to do any routing, and I'm just doing this for learning.

I'm testing why it is and how to fix the loss of the WebGUI on all of OPN's other interfaces when it picks up an IP that is directly connected at layer 2 to my PC.

Story for context: I was preparing to use OPN as the primary router (under PXE) by first messing about on it to learn how to implement configurations. The plan was to learn how to use it, and then get it ready with all of the rules and services that the ERX currently manages, make the complimenting changes to ERX to turn the ERX to just a managed switch, and then commit the changes to OPN first and then the ERX, and see if I could exercise the transition smoothly. I didn't have any extra ports on the ERX, so I took one that was used by a PowerEdge server that I'm not using right now; that PowerEdge was using VLAN100 as its untagged and a tagged VLAN101 for iDRAC--I left those alone when I first installed OPN. After OPN was installed, I assigned it to statically use 10.10.0.3; I used that IP to access the WebGUI. I lost access to that WebGUI/IP when I set up OPN with an IP on VLAN10 which I normally use for admin connections, and that's when I set out to understand the why and how.

Now, I'm wondering why OPN is using its 10.10.0.3 IP that is being contacted and is assigned on its VLAN100 interface instead on VLAN10, and if that's normal, then why is it that Firefox/Windows10 doesn't accept it, but ping/W10 does. I'm also wondering how to fix it so that I can use all of OPNs interface IPs, even if one of them is directly connected to the host that is trying to connect to one of the other IPs.

I migrated my main router from ER-605v1 (TP-link) to OPN virtualized a few months back.
I installed PVE+OPN on my LAN and migrated my VLANs over 1 by 1 until I made the switch.
I've kept some notes and intended to do a post about I but never did. I could still do it if there's interest.

You might want a more straightforward setup to learn.
Testing a 1 NIC setup was on my to-do list so I did that earlier today.
Also on my to-do was a network diagram so I just did that.
You cannot view this attachment.

PVE1 + OPN1 is a reasonably straightforward install.
* PVE vmbr0 for management (on INFRA VLAN via access port on the SWA switch).
* PVE vmbr1 used for WAN (DHCP)
* PVE vmbr2 used as parents for all internal VLANs (includes WAN2, more on that later)

All switches and APs are TP-link Omada devices under a HW controller. TPL VLAN used for management of these.

PVE2 is my spare and I did this earlier today (I already had an OPN setup with WAN + LAN on separate ports, WAN plugged in an access port for the WAN2 VLAN):
* Unplugged WAN & LAN
* Added a VLAN vmbr0.INFRA and moved the static IP from bridge to VLAN.
* Reconfigured the access port on SWA as a trunk => got back into PVE2
* Reset OPN2 to default settings using the console (ends in shutdown)
* Removed vmbr1/vmbr2 (were used as WAN/LAN), added vmbr0 to OPN2 VM
* Restarted the VM and pressed a key for interface assignment
* No LAGG
* Created vtnet0_vlanWAN2 (WAN2 VLAN interface in OPN1 so the WAN side of OPN2 is isolated in my main LAN)
* Created vtnet0_vlanLAN2 (this VLAN is only known to my switches - just VLAN not interface, OPN1 is unaware)
* Assigned WAN to vtnet0_vlanWAN2, LAN to vtnet0_vlanLAN2
* WAN picked up an IP via DHCP (from OPN1)
* LAN got the default static IP
* Configured a port on SWO as an access port for LAN2.
* Connected a test machine that got IP from OPN2.LAN (full Internet connectivity via one hop on OPN1, OPN2 accessible at 192.168.1.1).

Done.
1 NIC virtualized OPN with its own WAN (double NAT but otherwise OK to experiment) and LAN (more VLANs can be added).