OPNsense Forum

Archive => 16.7 Legacy Series => Topic started by: onnieoneone on October 04, 2016, 01:34:29 pm

Title: VLAN traffic going missing on bridge interface
Post by: onnieoneone on October 04, 2016, 01:34:29 pm
Hi,

I'm pretty new to OPNsense in particular and FreeBSD in general. I decided to have a go at it since a) I heard it was a great router distribution and b) it uses a strong host model and I am sick and tired of dealing with traffic leaking in/out the wrong interfaces in Linux.

Although at the moment though I wish it would leak a little more :)

Let me describe my problem a little...

I have a host with 4 physical interfaces: http://www.fit-pc.com/web/products/fitlet/fitlet-x/ (http://www.fit-pc.com/web/products/fitlet/fitlet-x/)

I would like to keep it simple and have the default LAN and WAN interfaces plumbed to 2 of the physical interfaces, igb0 and igb3. This works well.

The problem comes in when I try to set up bridge interfaces that bridge together both physical and VLAN interfaces (to act as a sort of Cisco-style BVI). My issue is that VLAN tagged traffic is going missing somewhere after it enters a physical interface and doesn't appear in a tcpdump on the bridge interface or the VLAN interface in question. The problem seems similar to the one mentioned here: https://redmine.pfsense.org/issues/2613 (https://redmine.pfsense.org/issues/2613)

Let me show you part of my setup:
Code: [Select]
root@OPNsense:~ # ifconfig bridge0
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:92:47:45:cc:00
        inet 10.1.6.1 netmask 0xffffff00 broadcast 10.1.6.255
        nd6 options=1<PERFORMNUD>
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: igb2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 3 priority 128 path cost 55
        member: igb1_vlan1016 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 15 priority 128 path cost 20000
root@OPNsense:~ # ifconfig igb2
igb2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=500bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:01:c0:1a:67:3a
        inet6 fe80::201:c0ff:fe1a:673a%igb2 prefixlen 64 scopeid 0x3
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
root@OPNsense:~ # ifconfig igb1_vlan1016
igb1_vlan1016: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=3<RXCSUM,TXCSUM>
        ether 00:01:c0:1a:67:39
        inet6 fe80::201:c0ff:fe1a:6739%igb1_vlan1016 prefixlen 64 scopeid 0xf
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 1016 parent interface: igb1
root@OPNsense:~ # ifconfig bridge3
bridge3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:92:47:45:cc:03
        inet 10.1.1.1 netmask 0xffffff00 broadcast 10.1.1.255
        nd6 options=1<PERFORMNUD>
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: igb1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 2 priority 128 path cost 55
root@OPNsense:~ # ifconfig igb1
igb1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=500bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:01:c0:1a:67:39
        inet6 fe80::201:c0ff:fe1a:6739%igb1 prefixlen 64 scopeid 0x2
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

To help diagnose this, I have performed some packet captures on the igb1, bridge0 and igb1_vlan1016 interfaces. These are attached (I had to filter the igb1 trace of other irrelevant traffic, I think all the relevant packets are included).

You'll see that the client host on vlan1016 is able to perform DHCP correctly (getting the 10.1.6.101 address), ARP works, an ICMP ping request gets sent out correctly, but the ICMP ping reply never makes it back to bridge0 or igb1_vlan1016. For further information the client is a guest VM running on a Linux KVM hypervisor patched into igb1 via a dumb/unmanaged switch. I believe the hypervisor is set up correctly and the ethernet frames are being correctly 802.1q labelled by it for vlan 1016 (take a look at the attached packet traces for details).

I have created allow all IPv4 type rules on all interfaces, and even a floating rule in case pf might be filtering this traffic out.. I can't see any blocked traffic in the logs, so I don't think that pf is dropping the traffic.

I guessed that there may be some ethernet filtering going on, so I altered some tunables, specifically
Code: [Select]
net.link.bridge.pfil_member=0
net.link.bridge.pfil_bridge=1
but this didn't seem to help.

Does anyone have an explanation for this behaviour or any advice on where I can look next?
Title: Re: VLAN traffic going missing on bridge interface
Post by: franco on October 04, 2016, 07:16:31 pm
Hi there,

Thanks for the thorough analysis!

Make sure you are on 16.7.5, there was an issue with if_bridge not adhering to its sysctls due to latent loading.

If this doesn't help, we're looking at a patch which pfSense had, which we removed some time around OPNsense 15.7 as we really needed to go back to standard FreeBSD. The original patch meanwhile got upstreamed to FreeBSD 11.0, so it's queued up for OPNsense 17.1 now, but that's January 2017 if all goes well.

Since it's in FreeBSD 11.0, I can take a stab at backporting it to FreeBSD 10.3 and build a test kernel if you're willing to try it? :)


Cheers,
Franco
Title: Re: VLAN traffic going missing on bridge interface
Post by: onnieoneone on October 05, 2016, 07:39:49 pm
Hi Franco,

I have been on 16.7.5 since my experimentation began.

Happy to try the backported version. How do we go about that?

Thanks

edit: I just tried destroying the bridge and reapplying the config (ip, dhcp server, pf) directly to the igb1_vlan1016 interface and I get correct behaviour, so at least I have confirmed the other parts of the puzzle.
Title: Re: VLAN traffic going missing on bridge interface
Post by: onnieoneone on October 28, 2016, 09:29:54 pm
Hi Franco,

I'm on 16.7.7 now. Do you think you might get to backporting the change before January's release? I'm still keen to give it a go.

Thanks
Title: Re: VLAN traffic going missing on bridge interface
Post by: franco on October 29, 2016, 08:01:09 am
Hi onnieoneone,

It looks like while issue was similar and was deemed fixed with the patch, that was a really long time ago and the code in FreeBSD changed considerably.

https://github.com/fichtner/pfsense-tools/commit/cba403d0126da81

Your best bet would be to try this on FreeBSD 11.0, which we provide an ALPHA build for, but it is by no means production ready.

Would you be able to test this? We could for example build a CD or USB image which you can use to import your config in live mode and boot into it to see if the issue disappears or not. Your installed system would stay intact.


Cheers,
Franco
Title: Re: VLAN traffic going missing on bridge interface
Post by: onnieoneone on October 31, 2016, 10:26:32 am
Hi Franco,

I would be more than happy to try out a USB image with my config.

Cheers
Title: Re: VLAN traffic going missing on bridge interface
Post by: franco on October 31, 2016, 12:55:49 pm
For serial or vga?
Title: Re: VLAN traffic going missing on bridge interface
Post by: onnieoneone on October 31, 2016, 10:23:15 pm
Sorry, vga thanks.
Title: Re: VLAN traffic going missing on bridge interface
Post by: franco on November 01, 2016, 09:57:17 am
Ok, here it is:

https://pkg.opnsense.org/snapshots/OPNsense-17.1.a-OpenSSL-vga-amd64.img.bz2

When you boot it, hit enter in the early installer key prompt, accept the settings, go to "import configuration" and then immediately "exit" which will boot up the system normally. From there you can start testing.

When you're done, just reboot and remove the VGA USB. :)


Cheers,
Franco
Title: Re: VLAN traffic going missing on bridge interface
Post by: onnieoneone on November 04, 2016, 12:47:27 pm
Ok, I have given it a go.

Unfortunately my packet captures are pretty much the same, with all the traffic from the host sitting on vlan1016 hitting igb1 but getting no further. As before traffic out to the host (an icmp ping) reaches the host, and the host replies, but again, that traffic gets lost somewhere after it hits igb1.

I am pretty sure the firewall is not to blame here (although my pf-log-reading-fu might need some sharpening). Default drop rule logging is enabled and I have no other drop rules.

I also tried setting the bridge sysctl parameters back to 'default' but still the same.

Am I conceptually doing something wrong by setting up these bridges as my only addressed interfaces (apart from WAN and LAN)?
Title: Re: VLAN traffic going missing on bridge interface
Post by: onnieoneone on November 04, 2016, 01:24:53 pm
I thought I would also try for fun to take the vlans out of the question, and the bridged interface does indeed work correctly with a device plugged into igb2.

Code: [Select]
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 02:7e:e3:17:05:00
inet 10.1.6.1 netmask 0xffffff00 broadcast 10.1.6.255
nd6 options=1<PERFORMNUD>
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: igb2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
        ifmaxaddr 0 port 3 priority 128 path cost 55
Title: Re: VLAN traffic going missing on bridge interface
Post by: onnieoneone on November 11, 2016, 05:55:18 pm
Hi, I wonder if there are any further thoughts from anyone about this one.

Just to summarize, my problems are like so:
Code: [Select]
physical = fine
physical ---> vlan = fine
physical ---> bridge = fine
physical -|-> vlan ---> bridge = not fine

In that everything works apart from certain traffic going missing (at least according to tcpdump) between physical and vlan interfaces in the last configuration. I say 'going missing' instead of 'being dropped' as I can't find any trace of missing packets with netstat, ifconfig, pf logs or any other tools I can think of.

This testing is all done on a single physical interface (although the whole point is to incorporate others).

I turned off all hardware offloards (vlanhwfilter etc.) without luck.

Where can I look for documentation/hints to go forward?

Worth trying the same setup in pfsense or vanilla freebsd to see if I can recreate it?

Thanks
Title: Re: VLAN traffic going missing on bridge interface
Post by: franco on November 12, 2016, 09:43:21 am
Hi,

There are some issues in FreeBSD, but it's only a quick skim for obvious key words:

https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=vlan%20bridge

I would think for FreeBSD it's the same as we try to keep as close as possible, for pfSense I am unsure, 50% chance... It's definitely worth testing this. Make sure to use FreeBSD 10.3 and pfSense 2.3, other versions would skew your results.

I can't pinpoint the issue, maybe it is igb related with your current analysis pointing to a missing physical to vlan transition of the packet, but FreeBSD 11.0 code seemed to rule that out. :/


Thank you,
Franco
Title: Re: VLAN traffic going missing on bridge interface
Post by: onnieoneone on November 12, 2016, 06:29:26 pm
Hi Franco,

I will look into it further when I can, but I think for now I will give in and buy a managed switch and do the plumbing downstream of my opnsense host.

I'm guessing there is no analogue of a Cisco BVI for opnsense/freebsd because bridging vlans on 2 physical interfaces (say igb1_vlan1016 and igb2_vlan1016) will get the kernel involved in layer 2 switching and that will never be very nice unless there's some hardware integration like on Cisco (and other) devices. Right?

Still, it's something that would be nice to have working in the end.

Many thanks for your time.