LACP is not working

Started by Julien, November 22, 2020, 12:30:48 AM

Previous topic - Next topic
November 22, 2020, 12:30:48 AM Last Edit: November 22, 2020, 12:33:28 AM by Julien
Dear all,
We have been using OPNsense for sometimes now, one our customer has two ISP switch Layer 3, each Switch provides 1GB NIC with 1Gbps speed to the opnsense
this configuration has been working excellent before with pfsense.
Switches are configured to do LACP on interfaces ( as I mentioned it been working for long ).

Switch 1 >>>> Port 1 >>>>> Pfsense Port 1 ( now is Opnsense )
Switch 2 >>>> Port 1 >>>>> Pfsense Port 2 ( now is opnsense)

Both switches are stacked and are Brocade ICX7250.

as I mentioned this config been working with pfsense for long time until we convince the customer to move to OPNsens.

the issue is:

We have created LAGG see attacked this LAG is em2 and em3 as LAN LAGG. so whenever we connect the cables to the switch the error pops on the console of the OPNS and keeps popping ups.

interface stopped distributing possible flapping


after I tried a pfsense installation clean installation and I create the LAGG everything works as expected.

Can someone please help clear things to me, is it the OPNsens? configuration ? what am I doing wrong?

appreciate any help/ideas.
DEC4240 – OPNsense Owner


,,The S in IoT stands for Security!" :)

Same author?  :o

Maybe FreeBSD 12 is handling lacp bit different. Then you should have the same Error when Testing pfsense 2.5

Quote from: mimugmail on November 22, 2020, 09:51:06 AM
Same author?  :o

Haha, sorry, didn't had a look at the authors name  :)
,,The S in IoT stands for Security!" :)

Quote from: Gauss23 on November 22, 2020, 10:39:49 AM
Quote from: mimugmail on November 22, 2020, 09:51:06 AM
Same author?  :o

Haha, sorry, didn't had a look at the authors name  :)

sorry contacted the support and they advise to open a new case at Generat Discussion,
so I just did :)
DEC4240 – OPNsense Owner

Quote from: mimugmail on November 22, 2020, 09:51:06 AM
Same author?  :o

Maybe FreeBSD 12 is handling lacp bit different. Then you should have the same Error when Testing pfsense 2.5

Then please test this.
Lacp in general works fine, so it must be something at the switches or hardware

Quote from: mimugmail on November 22, 2020, 09:51:06 AM
Same author?  :o

Maybe FreeBSD 12 is handling lacp bit different. Then you should have the same Error when Testing pfsense 2.5

Thank you for your answer. on the switch I can the LAG went to block when I connect the cables.
isn't pfsense using the same Freebsd 12 as OPNsense?
DEC4240 – OPNsense Owner

I can only comment that lagg does work in the general case.

My OPNsense:

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC>
ether 00:0d:b9:57:27:90
inet6 fe80::20d:b9ff:fe57:2790%lagg0 prefixlen 64 scopeid 0x9
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


My Cisco 2960-L:

cisco#sh lacp 4 neighbor
Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode     

Channel group 4 neighbors

Partner's information:

                  LACP port                        Admin  Oper   Port    Port
Port      Flags   Priority  Dev ID          Age    key    Key    Number  State
Gi0/15    SA      32768     000d.b957.2790   5s    0x0    0x12B  0x1     0x3D 
Gi0/16    SA      32768     000d.b957.2790   4s    0x0    0x12B  0x2     0x3D 


So, does your Brocade switch have some debugging capability? E.g. if I bring one of my two links down on the OPNsense side, enable debugging of LACP events on the Cisco, then bring the interface up again, I get this:

cisco#debug lacp event
Link Aggregation Control Protocol events debugging is on
cisco#
Nov 22 11:48:29.174: LACP: Gi0/16 set to UNSELECTED
Nov 22 11:48:30.170: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/16, changed state to down
Nov 22 11:48:31.174: %LINK-3-UPDOWN: Interface GigabitEthernet0/16, changed state to down
Nov 22 11:48:33.939: %LINK-3-UPDOWN: Interface GigabitEthernet0/16, changed state to up
Nov 22 11:48:34.943: LACP: Gi0/16 STANDBY aggregator hex address is 64DA810
Nov 22 11:48:34.944: LACP: Gi0/16 set to STANDBY
Nov 22 11:48:36.722: lacp_handle_standby_port_internal called, depth = 1
Nov 22 11:48:36.722: LACP: Gi0/16 standby->selected
Nov 22 11:48:36.722: LACP: Gi0/16 set to SELECTED
Nov 22 11:48:38.551: lacp_handle_standby_port_internal called, depth = 1
Nov 22 11:48:39.551: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/16, changed state to up


Please try and find some more detailled information on the switch side.

Kind regards,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Julien on November 22, 2020, 12:46:50 PM
isn't pfsense using the same Freebsd 12 as OPNsense?

He suggested to try pfSense 2.5 which is not yet released. It is based on the same FreeBSD version than current OPNsense release. So if you have the same problems there, it must be something about BSD 12 in combination with your switches.

https://www.pfsense.org/snapshots/
,,The S in IoT stands for Security!" :)

Quote from: pmhausen on November 22, 2020, 12:51:32 PM
I can only comment that lagg does work in the general case.

My OPNsense:

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC>
ether 00:0d:b9:57:27:90
inet6 fe80::20d:b9ff:fe57:2790%lagg0 prefixlen 64 scopeid 0x9
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


My Cisco 2960-L:

cisco#sh lacp 4 neighbor
Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode     

Channel group 4 neighbors

Partner's information:

                  LACP port                        Admin  Oper   Port    Port
Port      Flags   Priority  Dev ID          Age    key    Key    Number  State
Gi0/15    SA      32768     000d.b957.2790   5s    0x0    0x12B  0x1     0x3D 
Gi0/16    SA      32768     000d.b957.2790   4s    0x0    0x12B  0x2     0x3D 


So, does your Brocade switch have some debugging capability? E.g. if I bring one of my two links down on the OPNsense side, enable debugging of LACP events on the Cisco, then bring the interface up again, I get this:

cisco#debug lacp event
Link Aggregation Control Protocol events debugging is on
cisco#
Nov 22 11:48:29.174: LACP: Gi0/16 set to UNSELECTED
Nov 22 11:48:30.170: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/16, changed state to down
Nov 22 11:48:31.174: %LINK-3-UPDOWN: Interface GigabitEthernet0/16, changed state to down
Nov 22 11:48:33.939: %LINK-3-UPDOWN: Interface GigabitEthernet0/16, changed state to up
Nov 22 11:48:34.943: LACP: Gi0/16 STANDBY aggregator hex address is 64DA810
Nov 22 11:48:34.944: LACP: Gi0/16 set to STANDBY
Nov 22 11:48:36.722: lacp_handle_standby_port_internal called, depth = 1
Nov 22 11:48:36.722: LACP: Gi0/16 standby->selected
Nov 22 11:48:36.722: LACP: Gi0/16 set to SELECTED
Nov 22 11:48:38.551: lacp_handle_standby_port_internal called, depth = 1
Nov 22 11:48:39.551: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/16, changed state to up


Please try and find some more detailled information on the switch side.

Kind regards,
Patrick

I really appreciate your answer, I don't know if the switch has debug as I come from Cisco. what the crazy is when I connect a synolougy to the same ports its detect the LACP/ Pfsense Does.
seems like the MAC address of the NiCS somehow holded by the switch and cause the spanning tree act crazy?
is this possible?
DEC4240 – OPNsense Owner

We all are just guessing without debug logs

November 22, 2020, 06:22:39 PM #12 Last Edit: November 22, 2020, 06:29:01 PM by Julien
Quote from: mimugmail on November 22, 2020, 05:45:03 PM
We all are just guessing without debug logs

Does OPNsens has this option to check the LACP debug? like on pfsense I run this sysctl net.link.lagg.lacp.debug=1

I am not near the switch and the box just collecting the info to do tomorrow.

I've been doing some reading I understand it could be related to strict mode on opnsense is different than pfsense.
pfsense use 0 and opnsense use 1 ? is this correct ?

sysctl net.link.lagg.0.lacp.lacp_strict_mode=0
DEC4240 – OPNsense Owner

OPNsense has both sysctls, since FreeBSD has them:

root@opnsense:~ # sysctl net.link.lagg.lacp
net.link.lagg.lacp.default_strict_mode: 1
net.link.lagg.lacp.debug: 0


You could give the debug function on the OPNsense side a try. I just enabled it, then "shut; no shut" one interface on the Cisco side:

actor=(8000,00-0D-B9-57-27-90,012B,8000,0001)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-B6-70-D6-32-80,0004,8000,0110)
partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
maxdelay=0
igb0: lacpdu receive
actor=(8000,00-B6-70-D6-32-80,0004,8000,0110)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-0D-B9-57-27-90,012B,8000,0001)
partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
maxdelay=32768
igb1: lacpdu receive
actor=(8000,00-B6-70-D6-32-80,0004,8000,0111)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-0D-B9-57-27-90,012B,8000,0002)
partner.state=1d<ACTIVITY,AGGREGATION,SYNC,COLLECTING>
maxdelay=32768
igb1: lacpdu transmit
actor=(8000,00-0D-B9-57-27-90,012B,8000,0002)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-B6-70-D6-32-80,0004,8000,0111)
partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
maxdelay=0
igb0: lacpdu transmit
actor=(8000,00-0D-B9-57-27-90,012B,8000,0001)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-B6-70-D6-32-80,0004,8000,0110)
partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
maxdelay=0


As for the strict mode - there should not be anything fundamentally different wr/t lagg(4) in PFsense vs. OPNsense - different default values, perhaps. So just go ahead and set it to 0 ...

HTH,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

November 22, 2020, 10:05:23 PM #14 Last Edit: November 22, 2020, 10:23:39 PM by Julien
I'll will check tomorrow as first thing when I get to the office
I appreciate it really
I am worried when it fixed and remotely update the box will cause the same failure
But will have to fix it first

Thank you
DEC4240 – OPNsense Owner