vlan tagged at virtual function level stopped passing traffic after upgrade

Started by chiby, August 01, 2022, 04:55:16 AM

Previous topic - Next topic
i350 NIC - vlan tagged at virtual function level stopped passing traffic after upgrade to 22.7.

the configuration is:


          OPNsense 22.7
                 ^
                  |
proxmox - NIC PCI passthrough to VM
                 ^
                  |
nic i350 with sriov (vlan 20 tagged on some VFs) at linux level

Clients on vlan 20 got isolated. Couldn't capture (tagged/untagged) packets on igb2 with tcpdump from within opnsense. In the end I had to roll back/reinstall 22.1 to get my vlan subnets back. Those VFs with no vlan worked fine though, only issue is with the ones vlan'ed.

The underlying linux udev config setup of the vlan'd interface:

...
KERNEL=="0000:05:00.0", SUBSYSTEM=="pci", DRIVER=="igb", ATTR{vendor}=="0x8086", ATTR{device}=="0x1521", ATTR{sriov_numvfs}="4"
...
KERNEL=="0000:05:00.0", SUBSYSTEM=="pci", DRIVER=="igb", ATTR{vendor}=="0x8086", ATTR{device}=="0x1521", PROGRAM="/sbin/ip link set enp5s0f0 vf 1 mac 02:25:90:92:01:b2 vlan 20 spoofchk off trust on"
...


dmesg has nothing unusual:
igb2: <Intel(R) I350 Virtual Function> mem 0xfde10000-0xfde13fff,0xfde14000-0xfde17fff at device 27.0 on pci6
igb2: Using 1024 TX descriptors and 1024 RX descriptors
igb2: Using 1 RX queues 1 TX queues
igb2: Using MSI-X interrupts with 2 vectors
igb2: Ethernet address: 02:25:90:92:01:b2
igb2: link state changed to UP
igb2: netmap queues/slots: TX 1/1024, RX 1/1024


pciconf output:
igb2@pci0:6:27:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1520 subvendor=0x15d9 subdevice=0x1521
    vendor     = 'Intel Corporation'
    device     = 'I350 Ethernet Controller Virtual Function'
    class      = network
    subclass   = ethernet


ifconf output:

igb2: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WLAN
options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
ether 02:25:90:92:01:b2
inet 172.20.20.1 netmask 0xffffff00 broadcast 172.20.20.255
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>


No errors, nothing looks bad, still no juice. Any idea how to fix it?

(tried any possible combinations of disable/enable offloads and vlan filtering options inside opnsense..)

I am pretty sure I'm encountering the same issue.
I didn't touch anything settings wise, all I did was swap from LibreSSL to OpenSSL and then update everything to 22.7.

I can't provide technical specs as I'm unfortunately not as experienced, but I checked dmesg and everything seemed fine. Everything was working previously.
My VLANs seem to be having trouble passing traffic to my access point which is on another interface. I have them tagged and bridged to put it simply.

Basically what this means is none of my devices have wireless now.. which is a major issue.
I do have a config backup from before I updated, but I don't know the best method to rollback. I'm a bit nervous to use opnsense-revert -r 22.1.2
if that is even the proper way to rollback..? Really trying to avoid a factory reset.

Either way this seems like a pretty major issue, would appreciate it be patched fast.

note that my vlan setup is a bit unusual, i'm not using opnsense side vlan features (the only relevant thing i suspect is the igb virtual function driver in freebsd), but setting it on a virtual function (hardware virtualisation using sriov).
you might want to take a look at this post, which could be more relevant to you, though again, i'm just guessing your conf:
https://forum.opnsense.org/index.php?topic=29516.0

re your roll back question, I checked a few doco, but no clear instructions on how to do for a whole release roll back so I just confirmed I have a recent conf backup and reinstalled the old version then applied the previously backed up conf.. it worked fine for me.

Thank you so much! Changing from disabled to default offloading fixed my issue.
Best of luck with your issue, wish I could help :)

No problem.

Yeah, my problem might become a bit bigger challenge as getting the feeling it has less to do with opnsense and more with the included freebsd nic driver.

As I see 22.1 -> 22.7 upgraded from Freebsd 13.0 to 13.1?
If so, the number of commits of the relevant if_em.c file I stopped counting(ok its around 162). Seems like there was a fairly major rework of code for these intel NICs between the two versions.

Anyway, if anyone has any idea, pls don't hold back...

People should use the "default" setting here as disabling VLAN hardware capabilities depending on driver implementation will let VLANs stop working altogether.


Cheers,
Franco