I have a 3-port LACP LAGG configured on my OPNsense system that is connected to a Cisco SG350 managed switch. This has worked fine in previous versions of OPNsense going back years but since upgrading to 23.1 it gives problems. Specifically, it has trouble becoming active (configured) after boot. The individual
laggports will change in status, with the flags moving through various states such as
<>,
<COLLECTING>,
<ACTIVE, COLLECTING>, and even with some (but not all) in the desired
<ACTIVE, COLLECTING, DISTRIBUTING> state.
This is what it looks like when it is properly configured:
$ ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4812098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,NOMAP>
ether 00:eb:ca:c0:05:c5
laggproto lacp lagghash l2,l3,l4
laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: em2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: em3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
It seems that even after 10 minutes or so that the LAGG is still cycling through various states with the member
laggports and the interfaces built on this LAGG going
UP and
DOWN accordingly as it tries to configure. The easiest way to fix it is to restart the Cisco switch. ???
Has the way LAGG interfaces are configured changed in 23.1? I see these two entries in the Changelog for 23.1:
- interfaces: register LAGG, PPP, VLAN and wireless devices as plugins
- src: assorted FreeBSD 13 stable fixes for e.g. bpf, bridge, bsdinstall ifconfig, iflib, ipfw, ipsec, lagg, netmap, pf, route and vlan components
I don't understand the import of either of those statements. This setup worked flawlessly up to 22.7.11 and so whatever the problem is now appears to have crept in with 23.1.
Any hints or suggestions on how to get the LAGG to activate reliably are most appreciated.
I have a HP T730 with Intel T350-TX card. It has two LAGG groups with LACP enabled. After upgrade fom 22.7 to 23.1. The Lagg won't come up, the switch end will report LAG block, LACP timeout. I will need to login go to Interfaces > Other Types > LAGG and uncheck or check Fast timeout, save. The LAGG will work again.
Any suggestion or log I can provide to help trouble this?
Thanks,
E
BTW, this is still happening for me with the latest 23.1.3 update. Rebooting my managed switch is the easiest way for the LAGG to be established at the OPNsense end, otherwise the LAGG does not come up properly at the OPNsense end. :(
I can confirm this behavior with v23.1.3 and Intel 10Gb (ix) connected to a Juniper EX2300 (JUNOS 22.3R1.11) via LACP (fast).
After upgrading from v22.7 the LAGG didn't come up and with that all the VLAN's attached to it. Rebooting OPNsense multiple times (and me in full panic mode) didn't help, only after rebooting the switch things got back to normal. I noticed a high amount of interface in/out errors after reboot and the LAGG being down, but out of options how to debug the LAGG from OPNsense.
After the switch reboot all is fine now, I didn't looked in to it any further, but really like to know if there's anything I can do to provide more specific info from the OPNsense side what's happening.
In my case, I have a Brocade ICX6450. I tested with 23.1.4_1.
I tried to disconnect and reconnect the cables. That did not restore the LAGG.
By go to interface -> Other Types : LAGG -> select the lag and toggle the fast timeout check box. That will restore the LAGG.
Are there any log I can check on the opnsense end to determine the root cause and how to fix it?
Thanks,
E
Just updated to 23.1.5. LAG stays up after reboot.
See the information from reddit.
https://www.reddit.com/r/opnsense/comments/1255xr8/2314_lagg_wont_come_up_after_reboot/
Quote from: nghappiness on March 29, 2023, 02:46:50 PM
Just updated to 23.1.5. LAG stays up after reboot.
See the information from reddit.
https://www.reddit.com/r/opnsense/comments/1255xr8/2314_lagg_wont_come_up_after_reboot/
The Reddit link is very useful, thanks.
In my case, updating to 23.1.5 today did
not fix the problem of the LAGG coming up: I still had to reboot my switch. But, one of the Reddit replies says, "It seems to be a driver issue and custom eee/fc tunables set for your NIC," and I have
dev.em.0.fc set to
0 for my NICs. Maybe that is the cause of the problem? I will test and report back.
That could be it, but 23.1.5 would address that. Previously we recommended removing these tunables or moving them to /boot/loader.conf.local where they are not being triggered after bootup.
Cheers,
Franco
Quote from: franco on March 29, 2023, 08:46:48 PM
That could be it, but 23.1.5 would address that. Previously we recommended removing these tunables or moving them to /boot/loader.conf.local where they are not being triggered after bootup.
I checked the LAG configuration in my switch and note that everything checks out on both ends. All the ports are set to
Long timeout on the switch and I have
Fast timeout unchecked. Also, the default for
Administrative Flow Control is
Disable in the switch, which actually matches the tunable setting of
dev.em.X.fc being
0 for the LAGG members on the OPNsense side. I've changed the switch to
Auto Negotiation and removed the tunables and will see if that helps matters when I do a test reboot of OPNsense later today.
Unless something else changed between 22.7.x and 23.1.x, I can't think what might have caused the OPNsense LAGG to fail to come up after a reboot. None of the workarounds mentioned in the thread work for me. The only thing seems to be rebooting my managed switch.