[Solved] Sanity check on linux bridge setup for upstream inter-VLAN routing

Started by OPNenthu, October 09, 2025, 09:09:02 AM

Previous topic - Next topic
I migrated my desktop PC to Linux and also migrated some VirtualBox VMs there.  My goal is to expose two VLANs from OPNsense to the Linux host, on two separate linux bridges, so that the host and guest VMs can attach to either VLAN as needed and also maintain traffic isolation between them (no host routing).  I want OPNsense to handle the inter-VLAN routing and enforce its policies.

The host has a 2.5GbE Intel i226-v NIC that is connected via a trunk switch port configured as native=30(CLEAR) and tagged=20(VPN).  The host is to use 'br0' which carries the untagged traffic.  Guest VMs can attach to either 'br0' (for clear internet) or 'br20' (for VPN gateway).  OPNsense policies allow clients on VLAN 20 to reach local services on VLAN 30.

After some experimentation and failures, the best working setup I came up with is this:

$ ip a
...
3: enp10s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP group default qlen 1000
    link/ether 24:xx:xx:xx:xx:cd brd ff:ff:ff:ff:ff:ff
4: enp10s0.20@enp10s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br20 state UP group default qlen 1000
    link/ether 24:xx:xx:xx:xx:cd brd ff:ff:ff:ff:ff:ff
5: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 24:xx:xx:xx:xx:cd brd ff:ff:ff:ff:ff:ff
    inet 172.21.30.100/24 brd 172.21.30.255 scope global dynamic noprefixroute br0
       valid_lft 86118sec preferred_lft 86118sec
    inet6 2601:xx:xxxx:6db3:e7f7:39a6:1d2d:bed4/64 scope global temporary dynamic
       valid_lft 86371sec preferred_lft 85760sec
    inet6 2601:xx:xxxx:6db3:xxxx:xxxx:xxxx:9dca/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86371sec preferred_lft 86371sec
    inet6 fe80::xxxx:xxxx:xxxx:fb89/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
6: br20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a2:xx:xx:xx:xx:5a brd ff:ff:ff:ff:ff:ff
7: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:xx:xx:xx:xx:76 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever

(virbr0 is created by VirtualBox for its NAT networking- I don't manage it.)

Using NetworkManager / nmcli, I created br0 which has the NIC (enp10s0) as a slave port.  br0 also has IP addresses for the host itself to access VLAN 30.

I then created a VLAN sub-interface (enp10s0.20) to handle tagging on VLAN 20 and made this a slave port on br20.  I left br20 unconfigured because the host doesn't use it and any guest VMs attached to it can configure themselves with DHCP / SLAAC.  This bridge should hopefully make tagging transparent to the VMs and they can just pass untagged frames internally.

I also disabled IP forwarding globally via sysctl config:

$ cat /etc/sysctl.d/999-disable-ip-forwarding.conf
net.ipv4.ip_forward = 0
net.ipv6.conf.all.forwarding = 0

... and confirmed that no host route exists for VLAN 20:

$ ip r
default via 172.21.30.1 dev br0 proto dhcp src 172.21.30.100 metric 425
172.21.30.0/24 dev br0 proto kernel scope link src 172.21.30.100 metric 425
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown

$ ip -6 r
2601:xx:xxxx:6db3::/64 dev br0 proto ra metric 425 pref medium
fe80::/64 dev br0 proto kernel metric 1024 pref medium
default via fe80::xxxx:xxxx:xxxx:39a0 dev br0 proto ra metric 425 pref medium

So far so good and everything "works" as expected.  I have a guest VM in VirtualBox that is acting as a NAS on VLAN 30 and another guest VM that is acting as an SMB client on VLAN 20.  The client's internet is going through the VPN gateway and online speedtest results look great- full speed achieved with an 'A' score on the bufferbloat Waveform test.  From OPNsense logs I can see the inter-VLAN routing is taking place when I transfer a file from NAS->client:

You cannot view this attachment.

I observe a couple issues, however.

The first is not serious and I can live with it.  It's that the host bridge br0 takes some time after system boot to get its IP address.  When I was using the physical interface directly, DHCP would already be done by the time the desktop booted up.  With the bridge it takes an additional half a minute after logging on to get the IPs configured.  I expect SLAAC to have some delay because of RA intervals, but DHCP delay seems odd.

The second issue is that I am seeing high retransmit counts and small TCP congestion windows in iperf3 between the two VMs.  They are sharing a physical link up to the switch, but it should be full-duplex.  This is the iperf3 result from client to server VM:

$ iperf3 -c 172.21.30.108
Connecting to host 172.21.30.108, port 5201
[  5] local 172.21.20.130 port 40986 connected to 172.21.30.108 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   199 MBytes  1.67 Gbits/sec  436    277 KBytes       
[  5]   1.00-2.00   sec   211 MBytes  1.77 Gbits/sec   74    339 KBytes       
[  5]   2.00-3.00   sec   252 MBytes  2.12 Gbits/sec  174    349 KBytes       
[  5]   3.00-4.00   sec   236 MBytes  1.98 Gbits/sec  116    419 KBytes       
[  5]   4.00-5.00   sec   218 MBytes  1.82 Gbits/sec  131    290 KBytes       
[  5]   5.00-6.00   sec   206 MBytes  1.73 Gbits/sec   56    363 KBytes       
[  5]   6.00-7.00   sec   230 MBytes  1.93 Gbits/sec  161    356 KBytes       
[  5]   7.00-8.00   sec   199 MBytes  1.67 Gbits/sec   70    370 KBytes       
[  5]   8.00-9.00   sec   199 MBytes  1.67 Gbits/sec   51    358 KBytes       
[  5]   9.00-10.00  sec   188 MBytes  1.57 Gbits/sec   99    338 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.09 GBytes  1.79 Gbits/sec  1368             sender
[  5]   0.00-10.00  sec  2.08 GBytes  1.79 Gbits/sec                  receiver

It's a similar story in the opposite direction.

I can accept this for my needs, but I am curious what's causing it and if I misconfigured something.  I suspect fragmentation, maybe due to the VLAN tag overhead (?) but I'm not sure how to confirm.  All interfaces are using 1500 MTU as confirmed in linux.

My second question is regarding the architecture itself: is there anything that I overlooked which might come back to bite me?  Did I open myself to VPN leaks from the br20 clients?

TIA!

Nobody else runs a trunk to their workstation for desktop VMs?

I do but with Hyper-V on Windows 11, its where I run most opnsense, linux and freebsd VMs I use for dev stuff.

I use a trunk port that splita up into different VLANs on individual virtual switches. On Hyper-V thats simple, it's one checkbox in the virtual switch.
Hardware:
DEC740

I haven't spent much time with Hyper-V.  Does the inter-VLAN routing happen on the host itself, or do you force it up through OPNsense?  In case of the latter, are you seeing the full link speed in file transfers?

I checked the CPU usage on my router while the iperf test was running. In single thread mode, one CPU gets up to 70% use.  In multi-stream mode (-P 4) I see all 4 cores used, but none of them ever exceed 50-60%.  I was at first worried that my N5105 might be the bottleneck, but that doesn't seem to be the case.  There's no IDS/IDP in play.

Ive tested inter vlan routing between two Ubuntu hosts and an OPNsense on my Hypervisor and it went to around 10Gbit/s with iperf (I think 9.1 or something) sometime ago.

But I have a high clocking modern Ryzen CPU in it, it surely used some cores. (CPU maxed almost out)

My normal setup pushes all vlans out to a hardware switch and a hardware opnsense though, there is almost no inter vlan traffic in the vswitches themselves. Its a workstation after all with just vlan separated workstation related VMs, no server stuff is running.
Hardware:
DEC740

Eureka!

It's one of the OPNsense settings under Interfaces->Settings->Network Interfaces.  I never thought to change those from the defaults because of all the in-line docs warning that they cause problems.

I enabled all of these and now the performance is almost full line rate and significantly reduced the retransmits (at least in single stream mode).

* Hardware CRC
* Hardware TSO
* Hardware LRO
* VLAN Hardware Filtering

Now I need to go back and enable just one at a time (or combinations of them) to see what is the minimum set of options that are needed without causing any problems with the firewall or VLAN filtering.

So the good news is that linux software bridging and VLAN tagging is not broken on my PC.  I've just been leaving performance on the table with my router settings ;-)


$ iperf3 -c 172.21.30.108
Connecting to host 172.21.30.108, port 5201
[  5] local 172.21.20.130 port 35424 connected to 172.21.30.108 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   242 MBytes  2.03 Gbits/sec    7   1.24 MBytes       
[  5]   1.00-2.00   sec   279 MBytes  2.34 Gbits/sec    0   1.40 MBytes       
[  5]   2.00-3.00   sec   280 MBytes  2.35 Gbits/sec    0   1.54 MBytes       
[  5]   3.00-4.00   sec   279 MBytes  2.34 Gbits/sec    0   1.67 MBytes       
[  5]   4.00-5.00   sec   280 MBytes  2.35 Gbits/sec    0   1.79 MBytes       
[  5]   5.00-6.00   sec   279 MBytes  2.34 Gbits/sec    0   1.90 MBytes       
[  5]   6.00-7.00   sec   279 MBytes  2.34 Gbits/sec    0   1.99 MBytes       
[  5]   7.00-8.00   sec   280 MBytes  2.35 Gbits/sec    0   1.99 MBytes       
[  5]   8.00-9.00   sec   279 MBytes  2.34 Gbits/sec    0   1.99 MBytes       
[  5]   9.00-10.00  sec   280 MBytes  2.35 Gbits/sec    0   1.99 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.69 GBytes  2.31 Gbits/sec    7             sender
[  5]   0.00-10.00  sec  2.69 GBytes  2.31 Gbits/sec                  receiver

iperf Done.

$ iperf3 -c 172.21.30.108 -P 4
Connecting to host 172.21.30.108, port 5201
[  5] local 172.21.20.130 port 42172 connected to 172.21.30.108 port 5201
[  7] local 172.21.20.130 port 42176 connected to 172.21.30.108 port 5201
[  9] local 172.21.20.130 port 42192 connected to 172.21.30.108 port 5201
[ 11] local 172.21.20.130 port 42202 connected to 172.21.30.108 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  47.0 MBytes   394 Mbits/sec   86    396 KBytes       
[  7]   0.00-1.00   sec   101 MBytes   844 Mbits/sec   90    807 KBytes       
[  9]   0.00-1.00   sec  50.5 MBytes   423 Mbits/sec   44    385 KBytes       
[ 11]   0.00-1.00   sec  90.5 MBytes   759 Mbits/sec   90    822 KBytes       
[SUM]   0.00-1.00   sec   289 MBytes  2.42 Gbits/sec  310             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  46.2 MBytes   388 Mbits/sec   18    378 KBytes       
[  7]   1.00-2.00   sec  96.2 MBytes   807 Mbits/sec   27    675 KBytes       
[  9]   1.00-2.00   sec  45.3 MBytes   380 Mbits/sec    6    369 KBytes       
[ 11]   1.00-2.00   sec  90.0 MBytes   755 Mbits/sec   24    679 KBytes       
[SUM]   1.00-2.00   sec   278 MBytes  2.33 Gbits/sec   75             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  51.2 MBytes   430 Mbits/sec   25    288 KBytes       
[  7]   2.00-3.00   sec  87.5 MBytes   734 Mbits/sec   61    305 KBytes       
[  9]   2.00-3.00   sec  53.9 MBytes   452 Mbits/sec   38    263 KBytes       
[ 11]   2.00-3.00   sec  86.2 MBytes   724 Mbits/sec   86    311 KBytes       
[SUM]   2.00-3.00   sec   279 MBytes  2.34 Gbits/sec  210             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  61.2 MBytes   514 Mbits/sec   39    356 KBytes       
[  7]   3.00-4.00   sec  66.2 MBytes   556 Mbits/sec   31    373 KBytes       
[  9]   3.00-4.00   sec  67.8 MBytes   569 Mbits/sec    0    420 KBytes       
[ 11]   3.00-4.00   sec  83.8 MBytes   703 Mbits/sec   36    403 KBytes       
[SUM]   3.00-4.00   sec   279 MBytes  2.34 Gbits/sec  106             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  58.8 MBytes   493 Mbits/sec   12    376 KBytes       
[  7]   4.00-5.00   sec  80.0 MBytes   671 Mbits/sec   13    362 KBytes       
[  9]   4.00-5.00   sec  61.3 MBytes   514 Mbits/sec    0    520 KBytes       
[ 11]   4.00-5.00   sec  78.8 MBytes   661 Mbits/sec   17    436 KBytes       
[SUM]   4.00-5.00   sec   279 MBytes  2.34 Gbits/sec   42             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  63.8 MBytes   535 Mbits/sec   46    301 KBytes       
[  7]   5.00-6.00   sec  68.8 MBytes   577 Mbits/sec   67    290 KBytes       
[  9]   5.00-6.00   sec  61.9 MBytes   519 Mbits/sec   45    281 KBytes       
[ 11]   5.00-6.00   sec  83.8 MBytes   703 Mbits/sec   61    288 KBytes       
[SUM]   5.00-6.00   sec   278 MBytes  2.33 Gbits/sec  219             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  58.8 MBytes   493 Mbits/sec   45    286 KBytes       
[  7]   6.00-7.00   sec  73.8 MBytes   619 Mbits/sec   17    375 KBytes       
[  9]   6.00-7.00   sec  59.3 MBytes   498 Mbits/sec    0    416 KBytes       
[ 11]   6.00-7.00   sec  85.0 MBytes   713 Mbits/sec   29    328 KBytes       
[SUM]   6.00-7.00   sec   277 MBytes  2.32 Gbits/sec   91             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  51.2 MBytes   430 Mbits/sec   16    313 KBytes       
[  7]   7.00-8.00   sec  85.0 MBytes   713 Mbits/sec   28    400 KBytes       
[  9]   7.00-8.00   sec  52.6 MBytes   441 Mbits/sec    5    378 KBytes       
[ 11]   7.00-8.00   sec  90.0 MBytes   755 Mbits/sec   15    386 KBytes       
[SUM]   7.00-8.00   sec   279 MBytes  2.34 Gbits/sec   64             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  45.0 MBytes   377 Mbits/sec    4    288 KBytes       
[  7]   8.00-9.00   sec  96.2 MBytes   807 Mbits/sec    0    556 KBytes       
[  9]   8.00-9.00   sec  46.0 MBytes   386 Mbits/sec    6    233 KBytes       
[ 11]   8.00-9.00   sec  91.2 MBytes   765 Mbits/sec    0    542 KBytes       
[SUM]   8.00-9.00   sec   278 MBytes  2.34 Gbits/sec   10             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  56.2 MBytes   472 Mbits/sec  137    226 KBytes       
[  7]   9.00-10.00  sec  82.5 MBytes   692 Mbits/sec  154    259 KBytes       
[  9]   9.00-10.00  sec  52.9 MBytes   444 Mbits/sec  112    199 KBytes       
[ 11]   9.00-10.00  sec  87.5 MBytes   734 Mbits/sec  135    246 KBytes       
[SUM]   9.00-10.00  sec   279 MBytes  2.34 Gbits/sec  538             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   540 MBytes   453 Mbits/sec  428             sender
[  5]   0.00-10.00  sec   536 MBytes   450 Mbits/sec                  receiver
[  7]   0.00-10.00  sec   837 MBytes   702 Mbits/sec  488             sender
[  7]   0.00-10.00  sec   833 MBytes   699 Mbits/sec                  receiver
[  9]   0.00-10.00  sec   551 MBytes   463 Mbits/sec  256             sender
[  9]   0.00-10.00  sec   549 MBytes   460 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec   867 MBytes   727 Mbits/sec  493             sender
[ 11]   0.00-10.00  sec   863 MBytes   724 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  2.73 GBytes  2.34 Gbits/sec  1665             sender
[SUM]   0.00-10.00  sec  2.72 GBytes  2.33 Gbits/sec                  receiver

iperf Done.

UPDATE:

Hardware CRC/TSO/LRO did not make any difference in measurement, so I turned those back off.   I left only Hardware VLAN filtering enabled, which is unironically the one giving the performance uplift.

However I am now seeing the following message repeated in the firewall console:

pfr_update_stats: assertion failed.

... which I saw both when only CRC was enabled and also when only VLAN filtering is enabled.  It seems there might be a bad interaction somewhere with the igc driver I guess (?) which is impacting pf somehow, although I don't know the impact of it.  Everything seems OK at the moment despite the log spam.

Questions now:

- Should I experiment with leaving VLAN filtering enabled and also upgrade NIC firmware as per https://forum.opnsense.org/index.php?topic=48695.0?

Or

- Should I leave it disabled and accept that my N5105-based router is just a little too weak for software based inter-VLAN filtering at the 2.5GbE rate and without jumbo frames being used?

(I don't feel like messing with jumbo frames and breaking other network devices)

Found the answer to the "pfr_update_stats" spam message here: https://forum.opnsense.org/index.php?topic=35549.0

Looks like a benign debug statement, so I'll probably leave "Hardware VLAN filtering" enabled since it helps significantly.

If anyone is aware of a problem with this setting, especially if it impacts firewall security, kindly let me know!