LACP is not working

Started by Julien, November 22, 2020, 12:30:48 AM

Previous topic - Next topic
Do you know where i can see the logs of LAAG on the OPNsens?
i have used this command sysctl net.link.lagg.lacp.debug=1

disconnected and connected the cables but nothing shows on the console.

DEC4240 – OPNsense Owner

dmesg
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

November 23, 2020, 08:43:12 PM #17 Last Edit: November 23, 2020, 09:20:04 PM by Julien
@pmhausen today I have tried it but when I connect the cables the below has showen up.

lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP


interface we are using on the box are

Intel(R) PRO/1000 Network Connection

the Possible Flapping is the client is not sending fact 'laggproto lacp' wasn't set on the client-side.
but the configuration is already done on the LAG to use LACP.

after I rebooted the box I get those logs

WARNING: /mnt was not properly dismounted
WARNING: /mnt: mount pending error: blocks 259880 files 7
ugen0.2: <vendor 0x8087 product 0x0024> at usbus0
uhub2 on uhub0
uhub2: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus0
ugen1.2: <vendor 0x8087 product 0x0024> at usbus1
uhub3 on uhub1
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus1
WARNING: /mnt: reload pending error: blocks 259880 files 7
uhub2: 6 ports with 6 removable, self powered
uhub3: 6 ports with 6 removable, self powered
ugen0.3: <Avocent USB Composite Device-0> at usbus0
kbd2 at ukbd1
ugen1.5: <vendor 0x192f USB Optical Mouse> at usbus1
WARNING: /mnt: reload pending error: blocks 259880 files 7
lagg0: IPv6 addresses on em2 have been removed before adding it as a member to prevent IPv6 address scope violation.
lagg0: link state changed to DOWN
em3: link state changed to DOWN
lagg0: IPv6 addresses on em3 have been removed before adding it as a member to prevent IPv6 address scope violation.
em0: link state changed to UP
em0_vlan20: link state changed to UP
em0_vlan40: link state changed to UP
em0_vlan10: link state changed to UP
em0_vlan11: link state changed to UP
em0_vlan12: link state changed to UP
em0_vlan13: link state changed to UP
em0_vlan30: link state changed to UP
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN


em2 and em3 are the interfaces on the LAG.
DEC4240 – OPNsense Owner

You really should read about switch Troubleshooting, they get way more into detail

Quote from: mimugmail on November 23, 2020, 09:24:55 PM
You really should read about switch Troubleshooting, they get way more into detail

actually I do  :) the switch shows

=== LAG "OPNSENSEWAN" ID 1 (dynamic Deployed) ===
LAG Configuration:
   Ports:         e 1/2/1 e 1/2/1
   Port Count:    2
   Primary Port:  1/2/1
   Trunk Type:    hash-based
   LACP Key:      20001
Deployment: HW Trunk ID 1
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/2/1      Up      Blocked Full 1G    1     Yes 18   0   609c.9f3a.a488 
1/2/2      Up      Blocked  Full 1G   1     Yes 18   0   609c.9f3a.a488

Port       [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/2/1          1        1   20001   Yes   S   Agg  Syn  Col  Dis  Def  No   Ina
1/2/2.         1        1   20001   Yes   S   Agg  Syn  No   No   Def  No   Ina



Just know that if you get the above message (GigabitEthernet x/x/x is up, line protocol is down (LACP-BLOCKED), you have a LAG protocol mismatch.. this what the support teams says.

mean OPNsense is not sending LACPDU's (802.3ad) to the switch, and the switch cannot so the switch cannot breng the Brundle online.

how I can check if the LAG on the opnsense is already sending 802.3ad.

on the Opnsense LAG Protocol shows LCAP is

lacp
Supports the IEEE 802.3ad Link Aggregation Control Protocol (LACP) and the Marker Protocol. LACP will negotiate a set of aggregable links with the peer in to one or more Link Aggregated Groups. Each LAG is composed of ports of the same speed, set to full-duplex operation. The traffic will be balanced across the ports in the LAG with the greatest total speed, in most cases there will only be one LAG which contains all ports. In the event of changes in physical connectivity, Link Aggregation will quickly converge to a new configuration.

DEC4240 – OPNsense Owner

Did I miss a copy of ifconfig lagg0 on the OPNsense side or did you not provide one, yet?
Please do so.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: pmhausen on November 23, 2020, 10:13:54 PM
Did I miss a copy of ifconfig lagg0 on the OPNsense side or did you not provide one, yet?
Please do so.

Thank you, this what the lag config shows.
root@firewall:~ # ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=850098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO>
        ether e8:39:35:11:fa:ab
        inet6 fe80::ea39:35ff:fe11:faab%lagg0 prefixlen 64 scopeid 0xb
        inet 192.168.55.1 netmask 0xffffff00 broadcast 192.168.55.255
        laggproto lacp lagghash l2,l3,l4
        laggport: em2 flags=8<COLLECTING>
        laggport: em3 flags=8<COLLECTING>
        groups: lagg
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>



on the switch side it shows

LAG Configuration:
   Ports:         e 1/1/2 e 2/1/2
   Port Count:    2
   Primary Port:  1/1/2
   Trunk Type:    hash-based
   LACP Key:      20002
Deployment: HW Trunk ID 2
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d  LAN1
2/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d  LAN2

Port       [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/1/2           1        1   20002   Yes   S   Agg  No   No   No   No   No   Ina
2/1/2           1        1   20002   Yes   S   Agg  Syn  No   No   Def  Exp  Err


and clog /var/log/system.log

Nov 24 00:13:35 firewall kernel: em2: lacpdu receive
Nov 24 00:13:35 firewall kernel: actor=(0001,60-9C-9F-4B-80-8C,4E22,0001,0002)
Nov 24 00:13:35 firewall kernel: actor.state=7<ACTIVITY,TIMEOUT,AGGREGATION>
Nov 24 00:13:35 firewall kernel: partner=(8000,E8-39-35-11-FA-AB,016B,8000,0003)
Nov 24 00:13:35 firewall kernel: partner.state=1d<ACTIVITY,AGGREGATION,SYNC,COLLECTING>
Nov 24 00:13:35 firewall kernel: maxdelay=0
Nov 24 00:13:35 firewall kernel: em2: old pstate cf<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,DEFAULTED,EXPIRED>
Nov 24 00:13:35 firewall kernel: em2: new pstate f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC>
Nov 24 00:13:35 firewall kernel: em3: lacpdu transmit
Nov 24 00:13:35 firewall kernel: actor=(8000,E8-39-35-11-FA-AB,016B,8000,0004)
Nov 24 00:13:35 firewall kernel: actor.state=1d<ACTIVITY,AGGREGATION,SYNC,COLLECTING>
Nov 24 00:13:35 firewall kernel: partner=(0001,60-9C-9F-4B-80-8C,4E22,0001,0102)
Nov 24 00:13:35 firewall kernel: partner.state=cf<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,DEFAULTED,EXPIRED>
Nov 24 00:13:35 firewall kernel: maxdelay=0
Nov 24 00:13:35 firewall kernel: em2: lacpdu transmit
Nov 24 00:13:35 firewall kernel: actor=(8000,E8-39-35-11-FA-AB,016B,8000,0003)
Nov 24 00:13:35 firewall kernel: actor.state=1d<ACTIVITY,AGGREGATION,SYNC,COLLECTING>
Nov 24 00:13:35 firewall kernel: partner=(0001,60-9C-9F-4B-80-8C,4E22,0001,0002)
Nov 24 00:13:35 firewall kernel: partner.state=f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC>
Nov 24 00:13:35 firewall kernel: maxdelay=0
DEC4240 – OPNsense Owner

I am not quite sure if the hash algorithm is part of the LACP negotiation, but I would look up (or ask support) what that means:
Trunk Type:    hash-based

Asking for your ifconfig output I wanted to check if LACP was active on the FreeBSD side at all. Which it is. Then there are not that many parameters to tune that I know. You can set the headers used for hashing on the FreeBSD side, so possibly the switch and FreeBSD don't agree on that.

Other than that, send Brocade support the FreeBSD outputs and ask for more directions on debugging on the Brocade side.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

@Pmhausen thank you so much for your answer.
i hope someone will advise about the hash-lag
Trunk Type:    hash-based

i've been doing some digging too i, i have figure out a difference on the time out of the LAGG

Long LACP timeouts (the TIO flag) on the switch is short, is yours long or also short on the LACP?

this a working confi from a cisco i tested today

LAG Configuration:
   Ports:         e 1/2/1 e 1/2/3
   Port Count:    2
   Primary Port:  1/2/1
   Trunk Type:    hash-based
   LACP Key:      22047
   [b]LACP Timeout:  long[/b]

DEC4240 – OPNsense Owner

What's a timout? Sorry I have not configured anything like that. The Cisco config is simply:
interface Port-channel4
description OPNsense
switchport mode trunk
!
interface GigabitEthernet0/15
description OPNsense
switchport mode trunk
channel-group 4 mode active
!
interface GigabitEthernet0/16
description OPNsense
switchport mode trunk
channel-group 4 mode active


And that's that. Works as intended. ;)
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: pmhausen on November 24, 2020, 02:57:52 PM
What's a timout? Sorry I have not configured anything like that. The Cisco config is simply:
interface Port-channel4
description OPNsense
switchport mode trunk
!
interface GigabitEthernet0/15
description OPNsense
switchport mode trunk
channel-group 4 mode active
!
interface GigabitEthernet0/16
description OPNsense
switchport mode trunk
channel-group 4 mode active


And that's that. Works as intended. ;)

i just tried the configuration with unifi layer 3 switch and it does works.
this works with single switch as lag, i dont know if the issue is related to the stack or something else.
DEC4240 – OPNsense Owner

i have been doing some dugging on the switch and found out this.

the only thing i see its mismatch error.



Dynamic Log Buffer (50 lines):
Nov 26 02:29:40:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:40:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:40:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:40:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:40:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:40:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:39:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:39:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:33:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:33:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:33:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:33:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:33:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:33:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:32:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:32:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:26:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:26:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:26:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:26:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:26:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:26:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:25:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:25:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:19:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:19:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:19:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:19:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:19:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:19:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:18:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:18:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:12:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:12:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:12:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:12:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:12:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:12:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:11:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:11:I:System: Interface ethernet 1/1/1, state up
Nov 26 02:29:11:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:05:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:05:I:Trunk: Group (2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:05:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:05:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:05:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:04:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is up.
Nov 26 02:29:04:I:System: Interface ethernet 2/1/1, state up
Nov 26 02:29:01:I:Trunk: Group (2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:01:I:System: dynamic lag 1, has new peer info (priority=32768,id=e839.3511.faab,key=715) (N/A)

DEC4240 – OPNsense Owner

Is there a static vs. dynamic lagg setting on this switch?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

November 25, 2020, 08:52:33 PM #28 Last Edit: November 25, 2020, 08:58:57 PM by Julien
Quote from: pmhausen on November 25, 2020, 08:04:12 PM
Is there a static vs. dynamic lagg setting on this switch?

those are the LAG on the switch we do have.

LAN           dynamic  Y    2     1/1/2    e 1/1/2 e 2/1/2
LANG          static   N    3     none
LANGG         dynamic  N    4     none
NAS           dynamic  Y    11    1/1/11   e 1/1/11 to 1/1/12 e 2/1/11 to 2/1/12
WAN           dynamic  Y    1     1/1/1    e 1/1/1 e 2/1/1
wan           dynamic  N    5     none

DEC4240 – OPNsense Owner

You positive you're plugged into the right ports?  FreeBSD is reporting a different mac address than your switch claims it's advertising on the LAG:

Quote
Nov 24 00:13:35 firewall kernel: partner=(0001,60-9C-9F-4B-80-8C,4E22,0001,0002)


QuoteDeployment: HW Trunk ID 2
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d LAN1
2/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d  LAN2