On a dedicated box with SPF+ ports, no VLANS.
I have setup two firewalls on same hw, but I struggled with HA and loss of LAN-connection on this 2nd.
I found out that the problem is that LAG doesn't work on this 2nd fw.
I have removed the two ports from the LACP-lag on the switch - and also removed LAN interface and the lagg on OPNSense. Then I activated one and one port, to verify it was ix2 and ix3 that are correct. I enabled each interface one and one. In both directions. I deleted the interface for each time also. So when I added ix2 and ix3 to lag, attached lagg0 to LAN, I was 100% sure it was correct cables. It shows green in Web GUI for the LAG. I have also created allow-all rule in pfSense fw on this LAN-interface and rebooted.
No matter what I do, it doesn not estabilish connection. What can be wrong? The switch says "susp" on both ports in the LACP-lag there and Dev ID 0000000.
All is working outside the lag...
It simply says this:
Interface TenGigabitEthernet 1/0/10 suspended: LACP currently not enabled on the remote port.
2024-11-28
(5)Notifications
LACP
SUSPEND
Interface TenGigabitEthernet 2/0/10 suspended: LACP currently not enabled on the remote port.
2024-11-28
Please show the lagg0 configuration on the OPNsense side. Like in my screen shot.
Image didn't work.
But I assume it was from "Interfaces: Overview" - screen. I got a scrollable list, so wasn't easy to take picture. But here is the text version of it:
Flags 8843
Capabilities rxcsum
txcsum
vlan_mtu
vlan_hwtagging
jumbo_mtu
vlan_hwcsum
tso4
tso6
lro
wol_ucast
wol_mcast
wol_magic
vlan_hwfilter
vlan_hwtso
netmap
rxcsum_ipv6
txcsum_ipv6
hwstats
mextpg
Options vlan_mtu
jumbo_mtu
wol_ucast
wol_mcast
wol_magic
hwstats
mextpg
MAC Address 20:7c:14:f5:91:66 - Qotom
Supported Media autoselect
Physical
Device lagg0
mtu 1500
macaddr_hw 00:00:00:00:00:00
LAGG Protocol lacp
LAGG Hash l2
l3
l4
LAGG Options
flags flowid_shift
lacp_fast_timo 16
LAGG Statistics
active ports flapping
0 0
Groups lagg
Media Ethernet autoselect
Media (Raw) Ethernet autoselect
Status up
Routes 10.10.10.0/24
Identifier opt4
Description LAN
Enabled true
Link Type static
addr4 10.10.10.3/24
addr6
IPv4 Addresses
10.10.10.3/24
VLAN Tag
Gateways
Driver lagg0
Index 13
Promiscuous Listeners 0
Send Queue Length 0
Send Queue Max Length 50
Send Queue Drops 0
Type Ethernet
Address Length 6
Header Length 14
Link State 2
vhid 0
Data Length 152
Metric 0
Line Rate 10.00 Gbit/s
Packets Received 18378
Input Errors 0
Packets Transmitted 0
Output Errors 18
Collisions 0
Bytes Received 2421158
Bytes Transmitted 0
Multicasts Received 18378
Multicasts Transmitted 0
Input Queue Drops 0
Packets for Unknown Protocol 0
Hardware Offload Capabilities 0x0
Uptime at Attach or Statistics Reset 32
I'm thinking about just starting from scratch, I have no clue what is going on. The other fw I have of same brand/model, had no issues with this at all.
Nope, not the overview.
Interfaces > Other Types > LAGG - then open the configuration of your lagg IF.
There I have this. Attaching both assignment and the one you asked about.
Pick the hash layers matching the policy of your switch. Most common is L2 + L3.
On my 2nd box with same config, I have this default (empty), working with LACP there.
I tried to change it now to use l2+l3, I still get this:
(5)Notifications
LACP
SUSPEND
Interface TenGigabitEthernet 1/0/10 suspended: LACP currently not enabled on the remote port.
2024-11-28 17:10:44
(5)Notifications
LACP
SUSPEND
Interface TenGigabitEthernet 2/0/10 suspended: LACP currently not enabled on the remote port.
2024-11-28 17:10:44
Did you try slow instead of fast timeout? Any docs what your switch expects? Also did you disable all hardware offlading? Which would be the default ... disabled, that is.
All is disabled as default, haven't touched any optimization features.
Regarding slow/fast, so yes. I first had it at slow both places, but changed to fast after a day of not getting anywhere. So I have same settings on this lacp pair as the other opnsense box of same batch/type. I struggle a bit with both HA-units becoming master at same time, so I started to believe it could be a IP conflict (because VIP carp IP would then be active both places). But then it shouldn't work on single LAN, so not sure about that either.
I will go to the console, reset everything and maybe I will have better luck... Maybe something has gotten stuck.
If you can,
provide output from your CISCO switch
Quoteshow etherchannel summary
Also provide output of the lagg port configuration and the physical port configuration of the ports belonging to the LAGG on switch side.
Regards,
S.
If both firewalls become master for a carp vip it could be 2 things most likely:
- Both firewalls send out their VRRP advertisements, but they get lost on the way to the other firewall, either manipulated or dropped by the switch or blocked by a firewall rule
- The hashes of the vhid group are not the same on both sides. Make sure the coniguration is exactly the same, especially when having more vips in the same vhid carp group
The LAGG-issue was kind of solved. I switched out the cables (spf+) to a different pair and then I got connection. I still have an issue with active/passive, where I can only unplug one of the cables for some reason. But as long as both fibers are plugged in both switches, then lagg now works (it is a fs-switch with LACP).
I have vhid group 1 on the CARP WAN and vhd groud 2 on the CARP LAN. Same on second device. I have also deleted all the VIP'S and synced it over, so they are identical (using multicast, so I didn't have to specify peer IP).
I have disabled pfctl -d on both fw. Can it still be blocking?
Your switch could use igmp snooping to mess with multicast.
There could also be MAC security features that block the spoofed mac addresses of vrrp packets.
Thank you for your suggestion. I have a thread here on it: https://forum.opnsense.org/index.php?topic=44226.0
It seems to be that since I have a public /29 IP on my WAN on both devices and my ISP has routers that disable/enables each fiber at their end (participating in the /29), I can't do it like this. It is not a flat /29. Need to buy 2 new switches on the WAN-side, so each WAN-interface sees each other before I can connect my to OPNsense to the shared WAN-network.
Re-read what I wrote in the other thread. You do not need two more switches if you already have a pair of stackable ones and a handful of free ports.
VLANs == as many virtual switches as you like as long as there are ports. That's the point of VLANs. A VLAN is a virtual unmanaged switch.
Now back to topic - LAG issue:
While LAN now works, there are some issues as you see below. On OPNsense, it hasn't established bndl 100%.
I'm including a pfSense-box I also have in LACP lag (fast) that works 100%, that's the last lacp lagg shown in the list. It has the same config on the switch like the pfSense boxes.
If you look at the one unit of OVPNsense, it lists a blank Dev ID and even requesting Slow LACPDUs. But the one working has fast. There are no option to have both fast and slow on a lagg-pair, so I assume it is not actually requesting slow. At least no option to split them up.
Master FW
Aggregate port 10:
Local information:
LACP port Oper Port Port
Port Flags State Priority Key Number State
---------------------------------------------------------------------------
Te1/0/10 FA bndl 32768 0xa 0xa 0x3f
Te2/0/10 FA susp 32768 0xa 0x2c 0x47
Partner information:
LACP port Oper Port Port
Port Flags Priority Dev ID Key Number State
--------------------------------------------------------------------------
Te1/0/10 FA 32768 207c.14f5.9166 0x1b2 0x7 0x3f
Te2/0/10 SP 0 0000.0000.0000 0x0 0x0 0x0
FS#show lacp summary 2
Flags: S - Device is requesting Slow LACPDUs F - Device is requesting Fast LACPDUs.
A - Device is in active mode. P - Device is in passive mode.
Backup FW
Aggregate port 2:
Local information:
LACP port Oper Port Port
Port Flags State Priority Key Number State
---------------------------------------------------------------------------
Te1/0/2 FA susp 32768 0x2 0x2 0x47
Te2/0/2 FA bndl 32768 0x2 0x24 0x3f
Partner information:
LACP port Oper Port Port
Port Flags Priority Dev ID Key Number State
--------------------------------------------------------------------------
Te1/0/2 SP 0 0000.0000.0000 0x0 0x0 0x0
Te2/0/2 FA 32768 207c.14f5.916f 0x1d2 0x8 0x3f
FS#show lacp summary 1
Flags: S - Device is requesting Slow LACPDUs F - Device is requesting Fast LACPDUs.
A - Device is in active mode. P - Device is in passive mode.
pfSense (not OPNsense) unit I already have working, with same config on switch
Aggregate port 1:
Local information:
LACP port Oper Port Port
Port Flags State Priority Key Number State
---------------------------------------------------------------------------
Te1/0/1 FA bndl 32768 0x1 0x1 0x3f
Te2/0/18 FA bndl 32768 0x1 0x34 0x3f
Partner information:
LACP port Oper Port Port
Port Flags Priority Dev ID Key Number State
--------------------------------------------------------------------------
Te1/0/1 FA 32768 0cc4.7aaa.fba5 0x14b 0x2 0x3f
Te2/0/18 FA 32768 0cc4.7aaa.fba5 0x14b 0x4 0x3f
And here is the detailed LACP-info for interface for LACP-members on the switch - I just picked on of the two failing firewalls as it fails the same way on both boxes:
FS#show running-config interface Te2/0/10
Building configuration...
Current configuration: 112 bytes
interface TenGigabitEthernet 2/0/10
description FW3
port-group 10 mode active
lacp short-timeout
FS#show running-config interface Te1/0/
FS#show running-config interface Te1/0/10
Building configuration...
Current configuration: 112 bytes
interface TenGigabitEthernet 1/0/10
description FW3
port-group 10 mode active
lacp short-timeout
And it clearly says in the switch that LACP is not enabled on one of the ports. So two set of cables, on two machines - and both have the exact same problem. It must be a bonding error in the lacp-setting in opnsense (since it works on pfSense).
I have also disconnected the LACP-lag and no issues with the port member in question then, it worked just fine alone without LACP.
Both switch and the opnsense-box shows light/no light when I unplug/plug it into the port.
(5)Notifications
LACP
SUSPEND
Interface TenGigabitEthernet 1/0/2 suspended: LACP currently not enabled on the remote port.
2024-12-01 14:06:52
show lacp counters
Aggregate port 2:
Port InPkts OutPkts
-------------------------------
Te1/0/2 798391 1170027
Te2/0/2 945838 885832
I have disconnected the LACP-lag and now tested the LAN on the 2 individual ports that make up lacp 2. No problems, works perfectly one and one.
I have also double checked that the mac-address of the individual port in the switch vs the ones in opnsense is correct, so it is 100% sure it is physically connected.
As soon as joining the lacp-team, only one member of the team shows up correctly.
Tried to set everything to slow, both in my switch and on the lagg0.
Aggregate port 2:
Local information:
LACP port Oper Port Port
Port Flags State Priority Key Number State
---------------------------------------------------------------------------
Te1/0/2 SA susp 32768 0x2 0x2 0x45
Te2/0/2 SA bndl 32768 0x2 0x24 0x3d
Partner information:
LACP port Oper Port Port
Port Flags Priority Dev ID Key Number State
--------------------------------------------------------------------------
Te1/0/2 SP 0 0000.0000.0000 0x0 0x0 0x0
Te2/0/2 SA 32768 207c.14f5.916f 0x1d2 0x8 0x3d
During reboot, when OPNsense is down, it shows this status on the switch (correctly):
Aggregate port 2:
Local information:
LACP port Oper Port Port
Port Flags State Priority Key Number State
---------------------------------------------------------------------------
Te1/0/2 SA susp 32768 0x2 0x2 0x45
Te2/0/2 SA susp 32768 0x2 0x24 0x45
Partner information:
LACP port Oper Port Port
Port Flags Priority Dev ID Key Number State
--------------------------------------------------------------------------
Te1/0/2 SP 0 0000.0000.0000 0x0 0x0 0x0
Te2/0/2 SP 0 0000.0000.0000 0x0 0x0 0x0
It must be some standard in lacp that is not matching here and that one of the port is just going into a sleep/backup-state where it doesn't exchange correct data.
ifconfig
lagg0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
description: LAN (lan)
options=4e0382b<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
ether 20:7c:14:f5:91:6f
hwaddr 00:00:00:00:00:00
inet .2 netmask 0xffffff00 broadcast ...255
inet .1 netmask 0xffffff00 broadcast ...255 vhid 3
laggproto lacp lagghash l2
laggport: ix2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: ix3 flags=0<>
groups: lagg
carp: MASTER vhid 3 advbase 1 advskew 0
peer 224.0.0.18 peer6 ff02::12
media: Ethernet autoselect
status: active
The port of issue is really up:
root@f1:~ # ifconfig ix3
ix3: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4e0382b<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
ether 20:7c:14:f5:91:6f
hwaddr 20:7c:14:f5:91:70
media: Ethernet autoselect (Unknown <rxpause,txpause>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
TenGigabitEthernet 1/0/2 up 1 Full 10G fiber
TenGigabitEthernet 2/0/2 up 1 Full 10G fiber
root@f1:~ # tcpdump -i ix3 ether proto 0x8809
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ix3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:51:18.266702 LACPv1, length 110
21:51:19.351099 LACPv1, length 110
21:51:20.444063 LACPv1, length 110
21:51:21.547254 LACPv1, length 110
21:51:22.635720 LACPv1, length 110
21:51:23.725439 LACPv1, length 110
21:51:24.826664 LACPv1, length 110
ifconfig
NON-working OPNSense:
root@f1:~ # ifconfig lagg0
lagg0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
description: LAN (lan)
options=4e0382b<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
ether 20:7c:14:f5:91:6f
hwaddr 00:00:00:00:00:00
inet XXX.2 netmask 0xffffff00 broadcast XX255
inet XXX.1 netmask 0xffffff00 broadcast XX.255 vhid 3
laggproto lacp lagghash l2,l3,l4
laggport: ix2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: ix3 flags=0<>
groups: lagg
carp: MASTER vhid 3 advbase 1 advskew 0
peer 224.0.0.18 peer6 ff02::12
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Working pfSense:
lagg0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
description: LAN
options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
ether 0c:c4:7a:aa:fb:a5
hwaddr 00:00:00:00:00:00
inet XXX.1 netmask 0xffffff00 broadcast XXXX
inet6 fe80::ec4:7aff:feaa:fba5%lagg0 prefixlen 64 scopeid 0xa
laggproto lacp lagghash l2,l3,l4
laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igb3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
I'm getting nowhere..
I deleted this LACP-lagg on the switch and changed the lagg0 on OPNsense from lacp to failover.
Didn't change the ports members ix2 or ix3 though. And it works just like one would expect. I set ix3 as master (the one that was issue in lacp team). And pinging lan without issue. When I do "ifconfig ix3 down", it goes over to ix2 after 4-5 missing pings (so not as fast as lacp would be). And back to ix3 afterwards. With no problem at all. But would have prefered lacp...
Since I have exact same issue with two physical OPNsense boxes, it must be something in software on OPNsense box or OS. Against the same switch switch lacp works against pfSense..
FS-switches tent to have a lot of weird behavior for LAGG + LACP in the past, you can find it on their forum or reddit.
Did you try the LAGG with LACP fast disabled on both ends and bounce the LAGG on the switch side?
Are you running the latest OS for that switch?
Did you try to reboot the switch?
Did you possible check for know bugs for the switch?
One can argue that cause it works on PFsense and it doesn't on OPNsense, that is the issue of OPNsense. However I run LAGGs with LACP on OPNsense towards a Zyxel GS1900-24E switch and I do not have such issues.
Regards,
S.
When I setup my 2nd OPNSense box, I just left it default to LACP fast disabled and that is also default on the switch. So it didn't make any improvements. I also tried later to move everything over to fast and restart the LACP-interface and the relevant ports on the switch.
I have only had FS-switches for 4 months (and upgraded to last version then). Replaced all our switches with them. But this is the first time I have had any issue with LACP actually. I mainly have lacp on all ports, against Supermicro-bladeservers and other switches/gears. And Rocky Linux/Windows-servers. It has been like a dream, until now.
It is 24/7 environment, so I can't risk rebooting them unless solid reason.
I'll research a bit more. For now at least it works in this active/backup-mode. It could also be bugs with network driver I guess (vs my spf+ intel ports).
I have OPNsense running with LACP and Cisco and Mikrotik gear at the other end. Never a problem. So there is to my knowledge nothing fundamentally broken.
I also have a couple of dozens of FreeBSD servers (13.3/13.4) with LACP to Cisco switches.
Like Patrick M. Hausen already confirmed, LACP is working without any issues with latest OPNsense and any version I can remember (I'm using Juniper switches).
No experience with FS switches, but I noticed a inconsistency in your comparison with a working and non-working example in this post https://forum.opnsense.org/index.php?topic=44338.msg221412#msg221412 .
You've assigned your LAGG to the LAN interface of OPNsense, but it also looks like you have CARP enabled which isn't (or doesn't look like) with your PFsense config:
Quote
NON-working OPNSense:
...
carp: MASTER vhid 3 advbase 1 advskew 0
peer 224.0.0.18 peer6 ff02::12
...
As you have a (low level) LAGG interface problem and it seems also HA (CARP), I would rule out the whole HA stuff first. In other words, try to configure a single vanilla OPNsense box without any bells and whistles and try to configure the LAGG in this setup (which should work), only after that continue with any HA stuff to keep a clear overview.