LAG issue

Started by firewallfun, November 28, 2024, 05:47:29 PM

Previous topic - Next topic
November 28, 2024, 05:47:29 PM Last Edit: November 28, 2024, 06:05:37 PM by firewallfun
On a dedicated box with SPF+ ports, no VLANS.

I have setup two firewalls on same hw, but I struggled with HA and loss of LAN-connection on this 2nd.

I found out that the problem is that LAG doesn't work on this 2nd fw.

I have removed the two ports from the LACP-lag on the switch - and also removed LAN interface and the lagg on  OPNSense. Then I activated one and one port, to verify it was ix2 and ix3 that are correct. I enabled each interface one and one. In both directions. I deleted the interface for each time also. So when I added ix2 and ix3 to lag, attached lagg0 to LAN, I was 100% sure it was correct cables. It shows green in Web GUI for the LAG. I have also created allow-all rule in pfSense fw on this LAN-interface and rebooted.

No matter what I do, it doesn not estabilish connection. What can be wrong? The switch says "susp" on both ports in the LACP-lag there and Dev ID  0000000.

All is working outside the lag...

It simply says this:

Interface TenGigabitEthernet 1/0/10 suspended: LACP currently not enabled on the remote port.
2024-11-28
(5)Notifications
LACP
SUSPEND
Interface TenGigabitEthernet 2/0/10 suspended: LACP currently not enabled on the remote port.
2024-11-28

Please show the lagg0 configuration on the OPNsense side. Like in my screen shot.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Image didn't work.

But I assume it was from "Interfaces: Overview" - screen. I got a scrollable list, so wasn't easy to take picture. But here is the text version of it:


Flags   8843
Capabilities   rxcsum
txcsum
vlan_mtu
vlan_hwtagging
jumbo_mtu
vlan_hwcsum
tso4
tso6
lro
wol_ucast
wol_mcast
wol_magic
vlan_hwfilter
vlan_hwtso
netmap
rxcsum_ipv6
txcsum_ipv6
hwstats
mextpg
Options   vlan_mtu
jumbo_mtu
wol_ucast
wol_mcast
wol_magic
hwstats
mextpg
MAC Address   20:7c:14:f5:91:66 - Qotom
Supported Media   autoselect
Physical   
Device   lagg0
mtu   1500
macaddr_hw   00:00:00:00:00:00
LAGG Protocol   lacp
LAGG Hash   l2
l3
l4
LAGG Options   
flags   flowid_shift
lacp_fast_timo   16
LAGG Statistics   
active ports   flapping
0   0
Groups   lagg
Media   Ethernet autoselect
Media (Raw)   Ethernet autoselect
Status   up
Routes   10.10.10.0/24
Identifier   opt4
Description   LAN
Enabled   true
Link Type   static
addr4   10.10.10.3/24
addr6   
IPv4 Addresses   
10.10.10.3/24
VLAN Tag   
Gateways   
Driver   lagg0
Index   13
Promiscuous Listeners   0
Send Queue Length   0
Send Queue Max Length   50
Send Queue Drops   0
Type   Ethernet
Address Length   6
Header Length   14
Link State   2
vhid   0
Data Length   152
Metric   0
Line Rate   10.00 Gbit/s
Packets Received   18378
Input Errors   0
Packets Transmitted   0
Output Errors   18
Collisions   0
Bytes Received   2421158
Bytes Transmitted   0
Multicasts Received   18378
Multicasts Transmitted   0
Input Queue Drops   0
Packets for Unknown Protocol   0
Hardware Offload Capabilities   0x0
Uptime at Attach or Statistics Reset   32

I'm thinking about just starting from scratch, I have no clue what is going on. The other fw I have of same brand/model, had no issues with this at all.

Nope, not the overview.

Interfaces > Other Types > LAGG - then open the configuration of your lagg IF.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

November 28, 2024, 10:23:42 PM #4 Last Edit: November 28, 2024, 10:26:56 PM by firewallfun
There I have this. Attaching both assignment and the one you asked about.

Pick the hash layers matching the policy of your switch. Most common is L2 + L3.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

On my 2nd box with same config, I have this default (empty), working with LACP there.

I tried to change it now to use l2+l3, I still get this:

(5)Notifications
LACP
SUSPEND
Interface TenGigabitEthernet 1/0/10 suspended: LACP currently not enabled on the remote port.
2024-11-28 17:10:44
(5)Notifications
LACP
SUSPEND
Interface TenGigabitEthernet 2/0/10 suspended: LACP currently not enabled on the remote port.
2024-11-28 17:10:44

Did you try slow instead of fast timeout? Any docs what your switch expects? Also did you disable all hardware offlading? Which would be the default ... disabled, that is.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

All is disabled as default, haven't touched any optimization features.

Regarding slow/fast, so yes. I first had it at slow both places, but changed to fast after a day of not getting anywhere. So I have same settings on this lacp pair as the other opnsense box of same batch/type. I struggle a bit with both HA-units becoming master at same time, so I started to believe it could be a IP conflict (because VIP carp IP would then be active both places). But then it shouldn't work on single LAN, so not sure about that either.

I will go to the console, reset everything and maybe I will have better luck... Maybe something has gotten stuck.

If you can,

provide output from your CISCO switch

Quoteshow etherchannel summary

Also provide output of the lagg port configuration and the physical port configuration of the ports belonging to the LAGG on switch side.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

If both firewalls become master for a carp vip it could be 2 things most likely:

- Both firewalls send out their VRRP advertisements, but they get lost on the way to the other firewall, either manipulated or dropped by the switch or blocked by a firewall rule
- The hashes of the vhid group are not the same on both sides. Make sure the coniguration is exactly the same, especially when having more vips in the same vhid carp group
Hardware:
DEC740

The LAGG-issue was kind of solved. I switched out the cables (spf+) to a different pair and then I got connection. I still have an issue with active/passive, where I can only unplug one of the cables for some reason. But as long as both fibers are plugged in both switches, then lagg now works (it is a fs-switch with LACP).

I have vhid group 1 on the CARP WAN and vhd groud 2 on the CARP LAN. Same on second device. I have also deleted all the VIP'S and synced it over, so they are identical (using multicast, so I didn't have to specify peer IP).

I have disabled pfctl -d on both fw. Can it still be blocking?

Your switch could use igmp snooping to mess with multicast.

There could also be MAC security features that block the spoofed mac addresses of vrrp packets.
Hardware:
DEC740

Thank you for your suggestion. I have a thread here on it: https://forum.opnsense.org/index.php?topic=44226.0

It seems to be that since I have a public /29 IP on my WAN on both devices and my ISP has routers that disable/enables each fiber at their end (participating in the /29), I can't do it like this. It is not a flat /29. Need to buy 2 new switches on the WAN-side, so each WAN-interface sees each other before I can connect my to OPNsense to the shared WAN-network.

Re-read what I wrote in the other thread. You do not need two more switches if you already have a pair of stackable ones and a handful of free ports.

VLANs == as many virtual switches as you like as long as there are ports. That's the point of VLANs. A VLAN is a virtual unmanaged switch.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)