OPNsense Forum

English Forums => General Discussion => Topic started by: Julien on November 22, 2020, 12:30:48 am

Title: LACP is not working
Post by: Julien on November 22, 2020, 12:30:48 am
Dear all,
We have been using OPNsense for sometimes now, one our customer has two ISP switch Layer 3, each Switch provides 1GB NIC with 1Gbps speed to the opnsense
this configuration has been working excellent before with pfsense.
Switches are configured to do LACP on interfaces ( as I mentioned it been working for long ).

Switch 1 >>>> Port 1 >>>>> Pfsense Port 1 ( now is Opnsense )
Switch 2 >>>> Port 1 >>>>> Pfsense Port 2 ( now is opnsense)

Both switches are stacked and are Brocade ICX7250.

as I mentioned this config been working with pfsense for long time until we convince the customer to move to OPNsens.

the issue is:

We have created LAGG see attacked this LAG is em2 and em3 as LAN LAGG. so whenever we connect the cables to the switch the error pops on the console of the OPNS and keeps popping ups.

Code: [Select]
interface stopped distributing possible flapping(https://i.ibb.co/t4nT73C/IMG-2538.jpg) (https://ibb.co/hyrJPC8)

after I tried a pfsense installation clean installation and I create the LAGG everything works as expected.

Can someone please help clear things to me, is it the OPNsens? configuration ? what am I doing wrong?

appreciate any help/ideas.
Title: Re: LACP is not working
Post by: mimugmail on November 22, 2020, 08:57:34 am
What does switch logs when debugging on?
Title: Re: LACP is not working
Post by: Gauss23 on November 22, 2020, 09:36:36 am
Seems like he has the same issue:
https://forum.opnsense.org/index.php?topic=19899.msg93230#msg93230
Title: Re: LACP is not working
Post by: mimugmail on November 22, 2020, 09:51:06 am
Same author?  :o

Maybe FreeBSD 12 is handling lacp bit different. Then you should have the same Error when Testing pfsense 2.5
Title: Re: LACP is not working
Post by: Gauss23 on November 22, 2020, 10:39:49 am
Same author?  :o

Haha, sorry, didn't had a look at the authors name  :)
Title: Re: LACP is not working
Post by: Julien on November 22, 2020, 12:36:31 pm
Same author?  :o

Haha, sorry, didn't had a look at the authors name  :)

sorry contacted the support and they advise to open a new case at Generat Discussion,
so I just did :)
Title: Re: LACP is not working
Post by: mimugmail on November 22, 2020, 12:40:59 pm
Same author?  :o

Maybe FreeBSD 12 is handling lacp bit different. Then you should have the same Error when Testing pfsense 2.5

Then please test this.
Lacp in general works fine, so it must be something at the switches or hardware
Title: Re: LACP is not working
Post by: Julien on November 22, 2020, 12:46:50 pm
Same author?  :o

Maybe FreeBSD 12 is handling lacp bit different. Then you should have the same Error when Testing pfsense 2.5

Thank you for your answer. on the switch I can the LAG went to block when I connect the cables.
isn't pfsense using the same Freebsd 12 as OPNsense?
Title: Re: LACP is not working
Post by: Patrick M. Hausen on November 22, 2020, 12:51:32 pm
I can only comment that lagg does work in the general case.

My OPNsense:
Code: [Select]
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC>
ether 00:0d:b9:57:27:90
inet6 fe80::20d:b9ff:fe57:2790%lagg0 prefixlen 64 scopeid 0x9
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

My Cisco 2960-L:
Code: [Select]
cisco#sh lacp 4 neighbor
Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode     

Channel group 4 neighbors

Partner's information:

                  LACP port                        Admin  Oper   Port    Port
Port      Flags   Priority  Dev ID          Age    key    Key    Number  State
Gi0/15    SA      32768     000d.b957.2790   5s    0x0    0x12B  0x1     0x3D 
Gi0/16    SA      32768     000d.b957.2790   4s    0x0    0x12B  0x2     0x3D 

So, does your Brocade switch have some debugging capability? E.g. if I bring one of my two links down on the OPNsense side, enable debugging of LACP events on the Cisco, then bring the interface up again, I get this:
Code: [Select]
cisco#debug lacp event
Link Aggregation Control Protocol events debugging is on
cisco#
Nov 22 11:48:29.174: LACP: Gi0/16 set to UNSELECTED
Nov 22 11:48:30.170: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/16, changed state to down
Nov 22 11:48:31.174: %LINK-3-UPDOWN: Interface GigabitEthernet0/16, changed state to down
Nov 22 11:48:33.939: %LINK-3-UPDOWN: Interface GigabitEthernet0/16, changed state to up
Nov 22 11:48:34.943: LACP: Gi0/16 STANDBY aggregator hex address is 64DA810
Nov 22 11:48:34.944: LACP: Gi0/16 set to STANDBY
Nov 22 11:48:36.722: lacp_handle_standby_port_internal called, depth = 1
Nov 22 11:48:36.722: LACP: Gi0/16 standby->selected
Nov 22 11:48:36.722: LACP: Gi0/16 set to SELECTED
Nov 22 11:48:38.551: lacp_handle_standby_port_internal called, depth = 1
Nov 22 11:48:39.551: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/16, changed state to up

Please try and find some more detailled information on the switch side.

Kind regards,
Patrick
Title: Re: LACP is not working
Post by: Gauss23 on November 22, 2020, 01:09:15 pm
isn't pfsense using the same Freebsd 12 as OPNsense?

He suggested to try pfSense 2.5 which is not yet released. It is based on the same FreeBSD version than current OPNsense release. So if you have the same problems there, it must be something about BSD 12 in combination with your switches.

https://www.pfsense.org/snapshots/
Title: Re: LACP is not working
Post by: Julien on November 22, 2020, 05:32:08 pm
I can only comment that lagg does work in the general case.

My OPNsense:
Code: [Select]
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC>
ether 00:0d:b9:57:27:90
inet6 fe80::20d:b9ff:fe57:2790%lagg0 prefixlen 64 scopeid 0x9
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

My Cisco 2960-L:
Code: [Select]
cisco#sh lacp 4 neighbor
Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode     

Channel group 4 neighbors

Partner's information:

                  LACP port                        Admin  Oper   Port    Port
Port      Flags   Priority  Dev ID          Age    key    Key    Number  State
Gi0/15    SA      32768     000d.b957.2790   5s    0x0    0x12B  0x1     0x3D 
Gi0/16    SA      32768     000d.b957.2790   4s    0x0    0x12B  0x2     0x3D 

So, does your Brocade switch have some debugging capability? E.g. if I bring one of my two links down on the OPNsense side, enable debugging of LACP events on the Cisco, then bring the interface up again, I get this:
Code: [Select]
cisco#debug lacp event
Link Aggregation Control Protocol events debugging is on
cisco#
Nov 22 11:48:29.174: LACP: Gi0/16 set to UNSELECTED
Nov 22 11:48:30.170: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/16, changed state to down
Nov 22 11:48:31.174: %LINK-3-UPDOWN: Interface GigabitEthernet0/16, changed state to down
Nov 22 11:48:33.939: %LINK-3-UPDOWN: Interface GigabitEthernet0/16, changed state to up
Nov 22 11:48:34.943: LACP: Gi0/16 STANDBY aggregator hex address is 64DA810
Nov 22 11:48:34.944: LACP: Gi0/16 set to STANDBY
Nov 22 11:48:36.722: lacp_handle_standby_port_internal called, depth = 1
Nov 22 11:48:36.722: LACP: Gi0/16 standby->selected
Nov 22 11:48:36.722: LACP: Gi0/16 set to SELECTED
Nov 22 11:48:38.551: lacp_handle_standby_port_internal called, depth = 1
Nov 22 11:48:39.551: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/16, changed state to up

Please try and find some more detailled information on the switch side.

Kind regards,
Patrick

I really appreciate your answer, I don't know if the switch has debug as I come from Cisco. what the crazy is when I connect a synolougy to the same ports its detect the LACP/ Pfsense Does.
seems like the MAC address of the NiCS somehow holded by the switch and cause the spanning tree act crazy?
is this possible?
Title: Re: LACP is not working
Post by: mimugmail on November 22, 2020, 05:45:03 pm
We all are just guessing without debug logs
Title: Re: LACP is not working
Post by: Julien on November 22, 2020, 06:22:39 pm
We all are just guessing without debug logs

Does OPNsens has this option to check the LACP debug? like on pfsense I run this
Code: [Select]
sysctl net.link.lagg.lacp.debug=1
I am not near the switch and the box just collecting the info to do tomorrow.

I've been doing some reading I understand it could be related to strict mode on opnsense is different than pfsense.
 pfsense use 0 and opnsense use 1 ? is this correct ?

Code: [Select]
sysctl net.link.lagg.0.lacp.lacp_strict_mode=0
Title: Re: LACP is not working
Post by: Patrick M. Hausen on November 22, 2020, 07:00:39 pm
OPNsense has both sysctls, since FreeBSD has them:
Code: [Select]
root@opnsense:~ # sysctl net.link.lagg.lacp
net.link.lagg.lacp.default_strict_mode: 1
net.link.lagg.lacp.debug: 0

You could give the debug function on the OPNsense side a try. I just enabled it, then "shut; no shut" one interface on the Cisco side:
Code: [Select]
actor=(8000,00-0D-B9-57-27-90,012B,8000,0001)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-B6-70-D6-32-80,0004,8000,0110)
partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
maxdelay=0
igb0: lacpdu receive
actor=(8000,00-B6-70-D6-32-80,0004,8000,0110)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-0D-B9-57-27-90,012B,8000,0001)
partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
maxdelay=32768
igb1: lacpdu receive
actor=(8000,00-B6-70-D6-32-80,0004,8000,0111)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-0D-B9-57-27-90,012B,8000,0002)
partner.state=1d<ACTIVITY,AGGREGATION,SYNC,COLLECTING>
maxdelay=32768
igb1: lacpdu transmit
actor=(8000,00-0D-B9-57-27-90,012B,8000,0002)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-B6-70-D6-32-80,0004,8000,0111)
partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
maxdelay=0
igb0: lacpdu transmit
actor=(8000,00-0D-B9-57-27-90,012B,8000,0001)
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
partner=(8000,00-B6-70-D6-32-80,0004,8000,0110)
partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
maxdelay=0

As for the strict mode - there should not be anything fundamentally different wr/t lagg(4) in PFsense vs. OPNsense - different default values, perhaps. So just go ahead and set it to 0 ...

HTH,
Patrick
Title: Re: LACP is not working
Post by: Julien on November 22, 2020, 10:05:23 pm
I’ll will check tomorrow as first thing when I get to the office
I appreciate it really
I am worried when it fixed and remotely update the box will cause the same failure
But will have to fix it first

Thank you
Title: Re: LACP is not working
Post by: Julien on November 23, 2020, 12:31:01 pm
Do you know where i can see the logs of LAAG on the OPNsens?
i have used this command
Code: [Select]
sysctl net.link.lagg.lacp.debug=1
disconnected and connected the cables but nothing shows on the console.

Title: Re: LACP is not working
Post by: Patrick M. Hausen on November 23, 2020, 05:00:32 pm
Code: [Select]
dmesg
Title: Re: LACP is not working
Post by: Julien on November 23, 2020, 08:43:12 pm
@pmhausen today I have tried it but when I connect the cables the below has showen up.

Code: [Select]
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP

interface we are using on the box are

Code: [Select]
Intel(R) PRO/1000 Network Connection
the Possible Flapping is the client is not sending fact 'laggproto lacp' wasn't set on the client-side.
but the configuration is already done on the LAG to use LACP.

after I rebooted the box I get those logs

Code: [Select]
WARNING: /mnt was not properly dismounted
WARNING: /mnt: mount pending error: blocks 259880 files 7
ugen0.2: <vendor 0x8087 product 0x0024> at usbus0
uhub2 on uhub0
uhub2: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus0
ugen1.2: <vendor 0x8087 product 0x0024> at usbus1
uhub3 on uhub1
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus1
WARNING: /mnt: reload pending error: blocks 259880 files 7
uhub2: 6 ports with 6 removable, self powered
uhub3: 6 ports with 6 removable, self powered
ugen0.3: <Avocent USB Composite Device-0> at usbus0
kbd2 at ukbd1
ugen1.5: <vendor 0x192f USB Optical Mouse> at usbus1
WARNING: /mnt: reload pending error: blocks 259880 files 7
lagg0: IPv6 addresses on em2 have been removed before adding it as a member to prevent IPv6 address scope violation.
lagg0: link state changed to DOWN
em3: link state changed to DOWN
lagg0: IPv6 addresses on em3 have been removed before adding it as a member to prevent IPv6 address scope violation.
em0: link state changed to UP
em0_vlan20: link state changed to UP
em0_vlan40: link state changed to UP
em0_vlan10: link state changed to UP
em0_vlan11: link state changed to UP
em0_vlan12: link state changed to UP
em0_vlan13: link state changed to UP
em0_vlan30: link state changed to UP
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN

em2 and em3 are the interfaces on the LAG.
Title: Re: LACP is not working
Post by: mimugmail on November 23, 2020, 09:24:55 pm
You really should read about switch Troubleshooting, they get way more into detail
Title: Re: LACP is not working
Post by: Julien on November 23, 2020, 09:51:45 pm
You really should read about switch Troubleshooting, they get way more into detail

actually I do  :) the switch shows

Code: [Select]
=== LAG "OPNSENSEWAN" ID 1 (dynamic Deployed) ===
LAG Configuration:
   Ports:         e 1/2/1 e 1/2/1
   Port Count:    2
   Primary Port:  1/2/1
   Trunk Type:    hash-based
   LACP Key:      20001
Deployment: HW Trunk ID 1
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/2/1      Up      Blocked Full 1G    1     Yes 18   0   609c.9f3a.a488 
1/2/2      Up      Blocked  Full 1G   1     Yes 18   0   609c.9f3a.a488

Port       [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/2/1          1        1   20001   Yes   S   Agg  Syn  Col  Dis  Def  No   Ina
1/2/2.         1        1   20001   Yes   S   Agg  Syn  No   No   Def  No   Ina


Code: [Select]
Just know that if you get the above message (GigabitEthernet x/x/x is up, line protocol is down (LACP-BLOCKED), you have a LAG protocol mismatch.. this what the support teams says.

mean OPNsense is not sending LACPDU's (802.3ad) to the switch, and the switch cannot so the switch cannot breng the Brundle online.

how I can check if the LAG on the opnsense is already sending 802.3ad.

on the Opnsense LAG Protocol shows LCAP is

Code: [Select]
lacp
Supports the IEEE 802.3ad Link Aggregation Control Protocol (LACP) and the Marker Protocol. LACP will negotiate a set of aggregable links with the peer in to one or more Link Aggregated Groups. Each LAG is composed of ports of the same speed, set to full-duplex operation. The traffic will be balanced across the ports in the LAG with the greatest total speed, in most cases there will only be one LAG which contains all ports. In the event of changes in physical connectivity, Link Aggregation will quickly converge to a new configuration.
Title: Re: LACP is not working
Post by: Patrick M. Hausen on November 23, 2020, 10:13:54 pm
Did I miss a copy of ifconfig lagg0 on the OPNsense side or did you not provide one, yet?
Please do so.
Title: Re: LACP is not working
Post by: Julien on November 24, 2020, 12:19:59 am
Did I miss a copy of ifconfig lagg0 on the OPNsense side or did you not provide one, yet?
Please do so.

Thank you, this what the lag config shows.
Code: [Select]
root@firewall:~ # ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=850098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO>
        ether e8:39:35:11:fa:ab
        inet6 fe80::ea39:35ff:fe11:faab%lagg0 prefixlen 64 scopeid 0xb
        inet 192.168.55.1 netmask 0xffffff00 broadcast 192.168.55.255
        laggproto lacp lagghash l2,l3,l4
        laggport: em2 flags=8<COLLECTING>
        laggport: em3 flags=8<COLLECTING>
        groups: lagg
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


on the switch side it shows

Code: [Select]
LAG Configuration:
   Ports:         e 1/1/2 e 2/1/2
   Port Count:    2
   Primary Port:  1/1/2
   Trunk Type:    hash-based
   LACP Key:      20002
Deployment: HW Trunk ID 2
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d  LAN1
2/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d  LAN2

Port       [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/1/2           1        1   20002   Yes   S   Agg  No   No   No   No   No   Ina
2/1/2           1        1   20002   Yes   S   Agg  Syn  No   No   Def  Exp  Err

and
Code: [Select]
clog /var/log/system.log
Code: [Select]
Nov 24 00:13:35 firewall kernel: em2: lacpdu receive
Nov 24 00:13:35 firewall kernel: actor=(0001,60-9C-9F-4B-80-8C,4E22,0001,0002)
Nov 24 00:13:35 firewall kernel: actor.state=7<ACTIVITY,TIMEOUT,AGGREGATION>
Nov 24 00:13:35 firewall kernel: partner=(8000,E8-39-35-11-FA-AB,016B,8000,0003)
Nov 24 00:13:35 firewall kernel: partner.state=1d<ACTIVITY,AGGREGATION,SYNC,COLLECTING>
Nov 24 00:13:35 firewall kernel: maxdelay=0
Nov 24 00:13:35 firewall kernel: em2: old pstate cf<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,DEFAULTED,EXPIRED>
Nov 24 00:13:35 firewall kernel: em2: new pstate f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC>
Nov 24 00:13:35 firewall kernel: em3: lacpdu transmit
Nov 24 00:13:35 firewall kernel: actor=(8000,E8-39-35-11-FA-AB,016B,8000,0004)
Nov 24 00:13:35 firewall kernel: actor.state=1d<ACTIVITY,AGGREGATION,SYNC,COLLECTING>
Nov 24 00:13:35 firewall kernel: partner=(0001,60-9C-9F-4B-80-8C,4E22,0001,0102)
Nov 24 00:13:35 firewall kernel: partner.state=cf<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,DEFAULTED,EXPIRED>
Nov 24 00:13:35 firewall kernel: maxdelay=0
Nov 24 00:13:35 firewall kernel: em2: lacpdu transmit
Nov 24 00:13:35 firewall kernel: actor=(8000,E8-39-35-11-FA-AB,016B,8000,0003)
Nov 24 00:13:35 firewall kernel: actor.state=1d<ACTIVITY,AGGREGATION,SYNC,COLLECTING>
Nov 24 00:13:35 firewall kernel: partner=(0001,60-9C-9F-4B-80-8C,4E22,0001,0002)
Nov 24 00:13:35 firewall kernel: partner.state=f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC>
Nov 24 00:13:35 firewall kernel: maxdelay=0
Title: Re: LACP is not working
Post by: Patrick M. Hausen on November 24, 2020, 12:58:04 pm
I am not quite sure if the hash algorithm is part of the LACP negotiation, but I would look up (or ask support) what that means:
Code: [Select]
Trunk Type:    hash-based
Asking for your ifconfig output I wanted to check if LACP was active on the FreeBSD side at all. Which it is. Then there are not that many parameters to tune that I know. You can set the headers used for hashing on the FreeBSD side, so possibly the switch and FreeBSD don't agree on that.

Other than that, send Brocade support the FreeBSD outputs and ask for more directions on debugging on the Brocade side.
Title: Re: LACP is not working
Post by: Julien on November 24, 2020, 01:13:43 pm
@Pmhausen thank you so much for your answer.
i hope someone will advise about the hash-lag
Code: [Select]
Trunk Type:    hash-based
i've been doing some digging too i, i have figure out a difference on the time out of the LAGG

Long LACP timeouts (the TIO flag) on the switch is short, is yours long or also short on the LACP?

this a working confi from a cisco i tested today

Code: [Select]
LAG Configuration:
   Ports:         e 1/2/1 e 1/2/3
   Port Count:    2
   Primary Port:  1/2/1
   Trunk Type:    hash-based
   LACP Key:      22047
   [b]LACP Timeout:  long[/b]
Title: Re: LACP is not working
Post by: Patrick M. Hausen on November 24, 2020, 02:57:52 pm
What's a timout? Sorry I have not configured anything like that. The Cisco config is simply:
Code: [Select]
interface Port-channel4
 description OPNsense
 switchport mode trunk
!
interface GigabitEthernet0/15
 description OPNsense
 switchport mode trunk
 channel-group 4 mode active
!
interface GigabitEthernet0/16
 description OPNsense
 switchport mode trunk
 channel-group 4 mode active

And that's that. Works as intended. ;)
Title: Re: LACP is not working
Post by: Julien on November 24, 2020, 04:17:38 pm
What's a timout? Sorry I have not configured anything like that. The Cisco config is simply:
Code: [Select]
interface Port-channel4
 description OPNsense
 switchport mode trunk
!
interface GigabitEthernet0/15
 description OPNsense
 switchport mode trunk
 channel-group 4 mode active
!
interface GigabitEthernet0/16
 description OPNsense
 switchport mode trunk
 channel-group 4 mode active

And that's that. Works as intended. ;)

 i just tried the configuration with unifi layer 3 switch and it does works.
this works with single switch as lag, i dont know if the issue is related to the stack or something else.
Title: Re: LACP is not working
Post by: Julien on November 25, 2020, 07:07:18 pm
i have been doing some dugging on the switch and found out this.

the only thing i see its mismatch error.



Code: [Select]
Dynamic Log Buffer (50 lines):
Nov 26 02:29:40:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:40:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:40:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:40:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:40:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:40:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:39:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:39:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:33:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:33:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:33:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:33:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:33:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:33:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:32:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:32:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:26:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:26:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:26:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:26:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:26:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:26:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:25:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:25:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:19:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:19:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:19:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:19:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:19:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:19:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:18:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:18:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:12:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:12:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:12:I:Trunk: Group (1/1/1, 2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:12:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:12:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is down.
Nov 26 02:29:12:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:11:I:System: Logical link on dynamic lag interface ethernet 1/1/1 is up.
Nov 26 02:29:11:I:System: Interface ethernet 1/1/1, state up
Nov 26 02:29:11:I:Trunk: Group (1/1/1, 2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:05:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:05:I:Trunk: Group (2/1/1) removed by 802.3ad link-aggregation module.
Nov 26 02:29:05:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is down.
Nov 26 02:29:05:I:System: dynamic lag interface 2/1/1's peer info (priority=3,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:05:I:System: dynamic lag interface 1/1/1's peer info (priority=4,id=e839.3511.faab,key=0) mis-matches with lag's peer info (priority=32768,id=e839.3511.faab,key=715), set to mismatch Error
Nov 26 02:29:04:I:System: Logical link on dynamic lag interface ethernet 2/1/1 is up.
Nov 26 02:29:04:I:System: Interface ethernet 2/1/1, state up
Nov 26 02:29:01:I:Trunk: Group (2/1/1) created by 802.3ad link-aggregation module.
Nov 26 02:29:01:I:System: dynamic lag 1, has new peer info (priority=32768,id=e839.3511.faab,key=715) (N/A)
Title: Re: LACP is not working
Post by: Patrick M. Hausen on November 25, 2020, 08:04:12 pm
Is there a static vs. dynamic lagg setting on this switch?
Title: Re: LACP is not working
Post by: Julien on November 25, 2020, 08:52:33 pm
Is there a static vs. dynamic lagg setting on this switch?

those are the LAG on the switch we do have.

Code: [Select]
LAN           dynamic  Y    2     1/1/2    e 1/1/2 e 2/1/2
LANG          static   N    3     none
LANGG         dynamic  N    4     none
NAS           dynamic  Y    11    1/1/11   e 1/1/11 to 1/1/12 e 2/1/11 to 2/1/12
WAN           dynamic  Y    1     1/1/1    e 1/1/1 e 2/1/1
wan           dynamic  N    5     none
Title: Re: LACP is not working
Post by: SFC on November 25, 2020, 10:09:52 pm
You positive you're plugged into the right ports?  FreeBSD is reporting a different mac address than your switch claims it's advertising on the LAG:

Quote
Nov 24 00:13:35 firewall kernel: partner=(0001,60-9C-9F-4B-80-8C,4E22,0001,0002)


Quote
Deployment: HW Trunk ID 2
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d LAN1
2/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d  LAN2
Title: Re: LACP is not working
Post by: Julien on November 26, 2020, 03:16:58 pm
You positive you're plugged into the right ports?  FreeBSD is reporting a different mac address than your switch claims it's advertising on the LAG:

Quote
Nov 24 00:13:35 firewall kernel: partner=(0001,60-9C-9F-4B-80-8C,4E22,0001,0002)


Quote
Deployment: HW Trunk ID 2
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d LAN1
2/1/2      Up      Blocked Full 1G    2     Yes N/A  0   609c.9f4b.808d  LAN2

thank you for your answer, this probably because i created a new LAG to test it but still is not working.


I noticed a behaviour that was before at 19.1 on the lag interface https://github.com/opnsense/core/issues/3200 (https://github.com/opnsense/core/issues/3200)

Code: [Select]
Nov 26 20:47:27 firewall opnsense[5775]: /usr/local/etc/rc.linkup: Hotplug event detected for LANLAG(opt1) but ignoring since interface is configured with static IP (192.168.88.1 ::)

contacted the support of brocade Ruck switch and they advised to disable strict mode on the lag.

I have disabled this mode using the below command but the LACP is not coming online
sysctl net.link.lagg.lacp.default_strict_mode=0

Title: Re: LACP is not working
Post by: Julien on November 28, 2020, 05:48:46 pm
I am stil struggling with this.
now I am seeing a different logs on the opnsense. anyone a idea please?

Code: [Select]
Nov 28 17:44:32 firewall opnsense[19285]: /usr/local/etc/rc.linkup: Hotplug event detected for LANLAG(opt1) but ignoring since interface is configured with static IP (192.168.83.1 ::)
Nov 28 17:44:32 firewall opnsense[51767]: /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'lagg0'
Nov 28 17:44:32 firewall opnsense[51767]: /usr/local/etc/rc.newwanip: On (IP address: 192.168.83.1) (interface: LANLAG[opt1]) (real interface: lagg0).
Nov 28 17:44:32 firewall opnsense[51767]: plugins_configure hosts ()
Nov 28 17:44:32 firewall opnsense[51767]: plugins_configure hosts (execute task : dnsmasq_hosts_generate())
Nov 28 17:44:32 firewall opnsense[51767]: plugins_configure hosts (execute task : unbound_hosts_generate())
Nov 28 17:44:33 firewall kernel: bce1: Interface stopped DISTRIBUTING, possible flapping
Nov 28 17:44:33 firewall kernel: bce0: Interface stopped DISTRIBUTING, possible flapping
Nov 28 17:44:33 firewall kernel: lagg0: link state changed to DOWN
Nov 28 17:44:33 firewall opnsense[67168]: /usr/local/etc/rc.linkup: Hotplug event detected for LANLAG(opt1) but ignoring since interface is configured with static IP (192.168.83.1 ::)
Nov 28 17:44:39 firewall kernel: lagg0: link state changed to UP
Nov 28 17:44:40 firewall opnsense[99483]: /usr/local/etc/rc.linkup: Hotplug event detected for LANLAG(opt1) but ignoring since interface is configured with static IP (192.168.83.1 ::)
Nov 28 17:44:40 firewall opnsense[59439]: /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'lagg0'
Nov 28 17:44:40 firewall opnsense[59439]: /usr/local/etc/rc.newwanip: On (IP address: 192.168.83.1) (interface: LANLAG[opt1]) (real interface: lagg0).
Nov 28 17:44:40 firewall opnsense[59439]: plugins_configure hosts ()
Nov 28 17:44:40 firewall opnsense[59439]: plugins_configure hosts (execute task : dnsmasq_hosts_generate())
Nov 28 17:44:40 firewall opnsense[59439]: plugins_configure hosts (execute task : unbound_hosts_generate())
Nov 28 17:44:40 firewall kernel: bce1: Interface stopped DISTRIBUTING, possible flapping
Nov 28 17:44:40 firewall kernel: bce0: Interface stopped DISTRIBUTING, possible flapping
Nov 28 17:44:40 firewall kernel: lagg0: link state changed to DOWN
Nov 28 17:44:41 firewall opnsense[34568]: /usr/local/etc/rc.linkup: Hotplug event detected for LANLAG(opt1) but ignoring since interface is configured with static IP (192.168.83.1 ::)
Nov 28 17:44:42 firewall kernel: bce1: link state changed to DOWN
Nov 28 17:44:42 firewall kernel: bce0: link state changed to DOWN
Nov 28 17:44:43 firewall configctl[48202]: event @ 1606581883.04 msg: Nov 28 17:44:43 firewall.attcomputer.nl config[96814]: config-event: new_config /conf/backup/config-1606581883.0416.xml
Nov 28 17:44:43 firewall configctl[48202]: event @ 1606581883.04 exec: system event config_changed
Nov 28 17:44:45 firewall kernel: bce1: Gigabit link up!
Nov 28 17:44:45 firewall kernel: bce1: link state changed to UP
Nov 28 17:44:45 firewall kernel: lagg0: link state changed to UP
Nov 28 17:44:45 firewall kernel: bce0: Gigabit link up!
Nov 28 17:44:45 firewall kernel: bce0: link state changed to UP
Nov 28 17:44:45 firewall opnsense[49000]: /usr/local/etc/rc.linkup: Hotplug event detected for LANLAG(opt1) but ignoring since interface is configured with static IP (192.168.83.1 ::)
Nov 28 17:44:45 firewall opnsense[38393]: /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'lagg0'
Nov 28 17:44:45 firewall opnsense[38393]: /usr/local/etc/rc.newwanip: On (IP address: 192.168.83.1) (interface: LANLAG[opt1]) (real interface: lagg0).
Nov 28 17:44:45 firewall opnsense[38393]: plugins_configure hosts ()
Nov 28 17:44:45 firewall opnsense[38393]: plugins_configure hosts (execute task : dnsmasq_hosts_generate())
Nov 28 17:44:45 firewall opnsense[38393]: plugins_configure hosts (execute task : unbound_hosts_generate())
Nov 28 17:45:01 firewall /update_tables.py[249]: unable to resolve firewall-new-wa.nl for alias Klante_S2S
Nov 28 17:45:12 firewall /flowd_aggregate.py[18361]: vacuum src_addr_details_086400.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum src_addr_000300.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum src_addr_003600.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum src_addr_086400.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum dst_port_000300.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum dst_port_003600.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum dst_port_086400.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum interface_000030.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum interface_000300.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum interface_003600.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum interface_086400.sqlite
Nov 28 17:45:13 firewall /flowd_aggregate.py[18361]: vacuum done
Title: Re: LACP is not working
Post by: Gauss23 on November 28, 2020, 06:42:23 pm
Did not read the full post.
Maybe it's a BSD12 topic. Did you try it with pfSense 2.5?
Is it running with OPNsense 20.1?
Title: Re: LACP is not working
Post by: Julien on November 28, 2020, 08:53:59 pm
Did not read the full post.
Maybe it's a BSD12 topic. Did you try it with pfSense 2.5?
Is it running with OPNsense 20.1?

i havent tried it to be honest, with pfsense is working fine.
i am worried on getting it working now, and later it will maybe crashes if update will change the LACP behaivor.
i am at 20.7 i cannot go back to 20.1 ?
Title: Re: LACP is not working
Post by: mimugmail on November 28, 2020, 08:56:10 pm
Reinstall 20.1 and restore config
Title: Re: LACP is not working
Post by: Julien on November 28, 2020, 08:59:43 pm
Reinstall 20.1 and restore config

thank you for your answer.
i am not near the box have to drive 2hr to get there.
is this a bug/issue with 20.7?
just trying to understand to plan my next move.


i have fired up a hardware home with unifi switch, i ve created LACP and it seems to work.

this working with this release
Code: [Select]
OPNsense 20.7.5-amd64
FreeBSD 12.1-RELEASE-p10-HBSD
OpenSSL 1.1.1h 22 Sep 2020

Code: [Select]
Nov 28 21:52:08 firewall kernel: igb5: lacpdu transmit
Nov 28 21:52:08 firewall kernel: actor=(8000,00-08-A2-0C-99-7B,020B,8000,0006)
Nov 28 21:52:08 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:08 firewall kernel: partner=(8000,74-83-C2-48-2F-67,0042,0080,0018)
Nov 28 21:52:08 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:08 firewall kernel: maxdelay=0
Nov 28 21:52:08 firewall kernel: igb4: lacpdu transmit
Nov 28 21:52:08 firewall kernel: actor=(8000,00-08-A2-0C-99-7B,020B,8000,0005)
Nov 28 21:52:08 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:08 firewall kernel: partner=(8000,74-83-C2-48-2F-67,0042,0080,0017)
Nov 28 21:52:08 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:08 firewall kernel: maxdelay=0
Nov 28 21:52:16 firewall kernel: igb5: lacpdu receive
Nov 28 21:52:16 firewall kernel: actor=(8000,74-83-C2-48-2F-67,0042,0080,0018)
Nov 28 21:52:16 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:16 firewall kernel: partner=(8000,00-08-A2-0C-99-7B,020B,8000,0006)
Nov 28 21:52:16 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:16 firewall kernel: maxdelay=0
Nov 28 21:52:16 firewall kernel: igb4: lacpdu receive
Nov 28 21:52:16 firewall kernel: actor=(8000,74-83-C2-48-2F-67,0042,0080,0017)
Nov 28 21:52:16 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:16 firewall kernel: partner=(8000,00-08-A2-0C-99-7B,020B,8000,0005)
Nov 28 21:52:16 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:16 firewall kernel: maxdelay=0
 Nov 28 21:52:38 firewall kernel: igb5: lacpdu transmit
Nov 28 21:52:38 firewall kernel: actor=(8000,00-08-A2-0C-99-7B,020B,8000,0006)
Nov 28 21:52:38 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:38 firewall kernel: partner=(8000,74-83-C2-48-2F-67,0042,0080,0018)
Nov 28 21:52:38 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:38 firewall kernel: maxdelay=0
Nov 28 21:52:38 firewall kernel: igb4: lacpdu transmit
Nov 28 21:52:38 firewall kernel: actor=(8000,00-08-A2-0C-99-7B,020B,8000,0005)
Nov 28 21:52:38 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:38 firewall kernel: partner=(8000,74-83-C2-48-2F-67,0042,0080,0017)
Nov 28 21:52:38 firewall kernel:
Title: Re: LACP is not working
Post by: Gauss23 on November 28, 2020, 09:55:01 pm
thank you for your answer.
i am not near the box have to drive 2hr to get there.
is this a bug/issue with 20.7?
just trying to understand to plan my next move.

It may be related to BSD12. So even if you go for pfSense 2.4.x now, it may break with 2.5.
That's the reason why I asked if it's working with OPNsense 20.1 or pfSense 2.5

It could be a combination of BSD12 and your switches, because other people say it's working for them.

Just found this:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241785

Is this describing your issue?
Title: Re: LACP is not working
Post by: Julien on November 28, 2020, 09:59:19 pm
thank you for your answer.
i am not near the box have to drive 2hr to get there.
is this a bug/issue with 20.7?
just trying to understand to plan my next move.

It may be related to BSD12. So even if you go for pfSense 2.4.x now, it may break with 2.5.
That's the reason why I asked if it's working with OPNsense 20.1 or pfSense 2.5

It could be a combination of BSD12 and your switches, because other people say it's working for them.

Just found this:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241785

Is this describing your issue?

thank you for your answer.
yes its my issue. exactly as showen.
where do they mean to disable those settings?
 
Code: [Select]
We also use vlan + lagg + ix and we often need to add/remove vlans, so as a temporary solution we disable vlanhwfilter on lagg interface.
i just tried it nowhome with a unifi switch and it works. same settings.

Code: [Select]
OPNsense 20.7.5-amd64
FreeBSD 12.1-RELEASE-p10-HBSD
OpenSSL 1.1.1h 22 Sep 2020

Code: [Select]
Nov 28 21:52:08 firewall kernel: igb5: lacpdu transmit
Nov 28 21:52:08 firewall kernel: actor=(8000,00-08-A2-0C-99-7B,020B,8000,0006)
Nov 28 21:52:08 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:08 firewall kernel: partner=(8000,74-83-C2-48-2F-67,0042,0080,0018)
Nov 28 21:52:08 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:08 firewall kernel: maxdelay=0
Nov 28 21:52:08 firewall kernel: igb4: lacpdu transmit
Nov 28 21:52:08 firewall kernel: actor=(8000,00-08-A2-0C-99-7B,020B,8000,0005)
Nov 28 21:52:08 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:08 firewall kernel: partner=(8000,74-83-C2-48-2F-67,0042,0080,0017)
Nov 28 21:52:08 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:08 firewall kernel: maxdelay=0
Nov 28 21:52:16 firewall kernel: igb5: lacpdu receive
Nov 28 21:52:16 firewall kernel: actor=(8000,74-83-C2-48-2F-67,0042,0080,0018)
Nov 28 21:52:16 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:16 firewall kernel: partner=(8000,00-08-A2-0C-99-7B,020B,8000,0006)
Nov 28 21:52:16 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:16 firewall kernel: maxdelay=0
Nov 28 21:52:16 firewall kernel: igb4: lacpdu receive
Nov 28 21:52:16 firewall kernel: actor=(8000,74-83-C2-48-2F-67,0042,0080,0017)
Nov 28 21:52:16 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:16 firewall kernel: partner=(8000,00-08-A2-0C-99-7B,020B,8000,0005)
Nov 28 21:52:16 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:16 firewall kernel: maxdelay=0
 Nov 28 21:52:38 firewall kernel: igb5: lacpdu transmit
Nov 28 21:52:38 firewall kernel: actor=(8000,00-08-A2-0C-99-7B,020B,8000,0006)
Nov 28 21:52:38 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:38 firewall kernel: partner=(8000,74-83-C2-48-2F-67,0042,0080,0018)
Nov 28 21:52:38 firewall kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:38 firewall kernel: maxdelay=0
Nov 28 21:52:38 firewall kernel: igb4: lacpdu transmit
Nov 28 21:52:38 firewall kernel: actor=(8000,00-08-A2-0C-99-7B,020B,8000,0005)
Nov 28 21:52:38 firewall kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 28 21:52:38 firewall kernel: partner=(8000,74-83-C2-48-2F-67,0042,0080,0017)
Nov 28 21:52:38 firewall kernel:
Title: Re: LACP is not working
Post by: Gauss23 on November 28, 2020, 10:04:35 pm
Interfaces: Settings

VLAN Hardware Filtering. Set it to disable.

And all other HW stuff should be disabled (checkboxes checked), too.

It's a global setting for all interfaces.
Title: Re: LACP is not working
Post by: Julien on November 28, 2020, 10:06:35 pm
Interfaces: Settings

VLAN Hardware Filtering. Set it to disable.

And all other HW stuff should be disabled (checkboxes checked), too.

It's a global setting for all interfaces.

thank you, i just disabled it now.
it was enabled : Enable VLAN hardware Filtering now its disabled.
ill see if i can get the LACP working now
 will report back in a min


edit: unfrotunately its still not working

Code: [Select]
lagg0: link state changed to UP
bce1: Interface stopped DISTRIBUTING, possible flapping
bce0: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
bce1: Interface stopped DISTRIBUTING, possible flapping
bce0: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
bce1: Interface stopped DISTRIBUTING, possible flapping
bce0: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
bce1: Interface stopped DISTRIBUTING, possible flapping
bce0: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
Title: Re: LACP is not working
Post by: Gauss23 on November 28, 2020, 10:19:35 pm

edit: unfrotunately its still not working


I'm really sorry to hear that.

Options:
Title: Re: LACP is not working
Post by: Julien on November 28, 2020, 10:27:28 pm

edit: unfrotunately its still not working


I'm really sorry to hear that.

Options:
  • go with pfSense 2.4.x but you may run into the problem again when 2.5 is released. So maybe it's worth to try the 2.5 beta first. If it has the same problem it's BSD 12 related.
  • try OPNsense 20.1 and wait if there will be a patch someday
  • try different switches
  • are you able to replace the network cards in that server? Maybe it's only related to some drivers.

thank you so much for your continue answers. i'll reach out to the support and see what is the cause.

the server has two different NICS i tried them both same error.

appreciate it really


Edit.

i noticed two LAGS are sharing the same MAC address is this maybe the cause?
Code: [Select]
Deployment: HW Trunk ID 1
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/1/1      Down    None    None None  1     No  141  0   609c.9f4b.808c  WAN1
2/1/1      Down    None    None None  1     No  141  0   609c.9f4b.808c  WAN2


Code: [Select]
Deployment: HW Trunk ID 3
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/1/11     Up      Forward Full 1G    11    Yes 141  0   609c.9f4b.808c
1/1/12     Up      Forward Full 1G    11    Yes 141  0   609c.9f4b.808c
2/1/11     Up      Forward Full 1G    11    Yes 141  0   609c.9f4b.808c
2/1/12     Up      Forward Full 1G    11    Yes 141  0   609c.9f4b.808c
Title: Re: LACP is not working
Post by: djbmister on November 30, 2020, 11:42:25 am
To help along further with this, there is some possible issues occuring:

1. Have you turned of all offloading options on opnsense? - TSO, LRO? - make sure with the ifconfig
2. Port flap dampening configuration is worth looking at on this switch?
3. Energy saving features of the nics on the server - i.e. hw.em.eee_setting = 0

Maybe do a 'sysctl -A | grep *your network card driver* - 'sysctl -A | grep em or igb'

Also sysctl -A | grep lacp - lets see your lacp settings
Title: Re: LACP is not working
Post by: Julien on November 30, 2020, 03:05:17 pm
To help along further with this, there is some possible issues occuring:

1. Have you turned of all offloading options on opnsense? - TSO, LRO? - make sure with the ifconfig
2. Port flap dampening configuration is worth looking at on this switch?
3. Energy saving features of the nics on the server - i.e. hw.em.eee_setting = 0

Maybe do a 'sysctl -A | grep *your network card driver* - 'sysctl -A | grep em or igb'

Also sysctl -A | grep lacp - lets see your lacp settings

Thank you for your answer

Code: [Select]
1. Have you turned of all offloading options on opnsense? - TSO, LRO? - make sure with the ifconfigyes i did in interface settings , Hardware CRC / Hardware TSO /Hardware LRO / VLAN Hardware Filtering    are disabled
when i do ifconfig -vv lagg0

Code: [Select]
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=800008<VLAN_MTU>
        ether e8:39:35:11:fa:ab
        inet6 fe80::ea39:35ff:fe11:faab%lagg0 prefixlen 64 scopeid 0xb
        laggproto lacp lagghash l2,l3,l4
        lagg options:
                flags=10<LACP_STRICT>
                flowid_shift: 16
        lagg statistics:
                active ports: 0
                flapping: 0
        lag id: [(0000,00-00-00-00-00-00,0000,0000,0000),
                 (0000,00-00-00-00-00-00,0000,0000,0000)]
        laggport: em2 flags=0<> state=41<ACTIVITY,DEFAULTED>
                [(8000,E8-39-35-11-FA-AB,8003,8000,0003),
                 (FFFF,00-00-00-00-00-00,0000,FFFF,0000)]
        laggport: em3 flags=0<> state=41<ACTIVITY,DEFAULTED>
                [(8000,E8-39-35-11-FA-AB,8004,8000,0004),
                 (FFFF,00-00-00-00-00-00,0000,FFFF,0000)]
        groups: lagg
        media: Ethernet autoselect
        status: no carrier
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


Code: [Select]
2. Port flap dampening configuration is worth looking at on this switch?link error dampening is enabled
Code: [Select]
SSH@Ruckus2050-LS#show interface ethernet 1/1/2
GigabitEthernet1/1/2 is up, line protocol is down (LACP-BLOCKED)
  Port down (LACP-BLOCKED) for 1 day(s) 14 hour(s) 28 minute(s) 40 second(s)
  Hardware is GigabitEthernet, address is 609c.9f4b.808d (bia 609c.9f4b.808d)
  Configured speed auto, actual 1Gbit, configured duplex fdx, actual fdx
  Configured mdi mode AUTO, actual MDIX
  EEE Feature Disabled
  Member of 7 L2 VLANs, port is tagged, port state is BLOCKING
  BPDU guard is Disabled, ROOT protect is Disabled, Designated protect is Disabled
  Link Error Dampening is Enabled
  STP configured to ON, priority is level0, mac-learning is enabled
  Openflow is Disabled, Openflow Hybrid mode is Disabled,  Flow Control is config enabled, oper enabled, negotiation disabled
  Mirror disabled, Monitor disabled
  Mac-notification is disabled
  Member of active trunk ports 1/1/2,2/1/2, primary port is 1/1/2
  Member of configured trunk ports 1/1/2,2/1/2, primary port is 1/1/2
  Port name is LAN1
  IPG MII 96 bits-time, IPG GMII 96 bits-time
  MTU 10200 bytes, encapsulation ethernet
  300 second input rate: 0 bits/sec, 0 packets/sec, 0.00% utilization
  300 second output rate: 928 bits/sec, 0 packets/sec, 0.00% utilization
  15187 packets input, 1943872 bytes, 0 no buffer
  Received 1 broadcasts, 15186 multicasts, 0 unicasts
  0 input errors, 0 CRC, 0 frame, 0 ignored
  0 runts, 0 giants
  154231 packets output, 19755504 bytes, 0 underruns
  Transmitted 214 broadcasts, 153930 multicasts, 86 unicasts
  0 output errors, 0 collisions
  Relay Agent Information option: Disabled

UC Egress queues:
Queue counters    Queued packets    Dropped Packets
         0                   0                   0
         1                   0                   0
         2                   0                   0
         3                   0                   0
         4                   0                   0
         5                   0                   0
         6                   0                   0
         7              152356                   0


MC Egress queues:
Queue counters    Queued packets    Dropped Packets
         0                 217                   0
         1                 148                   0
         2                1510                   0
         3                   0                   0

Code: [Select]
3. Energy saving features of the nics on the server - i.e. hw.em.eee_setting = 0
i am not sure i understand this correctly?

I see also MAC- learning is enabled, we have two opnsense boxes connected to the switch one is on and one is off, both boxes run the same configuration in case first one goes down we fired up the second one, maybe the Mac-learning causing this?
Title: Re: LACP is not working
Post by: djbmister on December 01, 2020, 04:46:49 pm
Could you run

'sysctl -A | grep *your network card driver* - i.e. 'sysctl -A | grep em or igb'

Also 'sysctl -A | grep lacp' - lets see your lacp settings

And post output of each command, em or igb is the intel driver

eee is an energy efficient feature can cause issues on freebsd for nics. By setting 'hw.em.eee_setting = 0' in the tunables will turn this off for all nics.

Also what is your lagg settings on your opnsense? - have you tried loadbalance mode?

also on opnsense, lets see the lacp debugging

'sysctl net.link.lagg.lacp.debug=1' - then share the system log - 'clog /var/log/system.log'


IMAHO: It seems someone else has the same issue as you on pfsense - https://forum.netgate.com/topic/158534/lacp-not-working/79 - and thats a brocade switch.
Title: Re: LACP is not working
Post by: Julien on December 02, 2020, 09:18:05 pm
Could you run

'sysctl -A | grep *your network card driver* - i.e. 'sysctl -A | grep em or igb'

Also 'sysctl -A | grep lacp' - lets see your lacp settings

And post output of each command, em or igb is the intel driver

eee is an energy efficient feature can cause issues on freebsd for nics. By setting 'hw.em.eee_setting = 0' in the tunables will turn this off for all nics.

Also what is your lagg settings on your opnsense? - have you tried loadbalance mode?

also on opnsense, lets see the lacp debugging

'sysctl net.link.lagg.lacp.debug=1' - then share the system log - 'clog /var/log/system.log'


IMAHO: It seems someone else has the same issue as you on pfsense - https://forum.netgate.com/topic/158534/lacp-not-working/79 - and thats a brocade switch.

thank you for your answer,
this the log on t he system.log

Code: [Select]
Dec  2 17:37:17 firewall /flowd_aggregate.py[18361]: vacuum src_addr_details_086400.sqlite
Dec  2 17:37:19 firewall /flowd_aggregate.py[18361]: vacuum src_addr_000300.sqlite
Dec  2 17:37:19 firewall /flowd_aggregate.py[18361]: vacuum src_addr_003600.sqlite
Dec  2 17:37:19 firewall /flowd_aggregate.py[18361]: vacuum src_addr_086400.sqlite
Dec  2 17:37:19 firewall /flowd_aggregate.py[18361]: vacuum dst_port_000300.sqlite
Dec  2 17:37:19 firewall /flowd_aggregate.py[18361]: vacuum dst_port_003600.sqlite
Dec  2 17:37:19 firewall /flowd_aggregate.py[18361]: vacuum dst_port_086400.sqlite
Dec  2 17:37:19 firewall /flowd_aggregate.py[18361]: vacuum interface_000030.sqlite
Dec  2 17:37:20 firewall /flowd_aggregate.py[18361]: vacuum interface_000300.sqlite
Dec  2 17:37:20 firewall /flowd_aggregate.py[18361]: vacuum interface_003600.sqlite
Dec  2 17:37:20 firewall /flowd_aggregate.py[18361]: vacuum interface_086400.sqlite


Code: [Select]
root@firewall:~ # sysctl -A | grep lacp
net.link.lagg.lacp.default_strict_mode: 1
net.link.lagg.lacp.debug: 0


I am not sure what to do here
Code: [Select]
'sysctl -A | grep *your network card driver* - i.e. 'sysctl -A | grep em or igb'
do. you mean to run sysctl -A | grep em or igb01 if the interface is igb01.
or you mean I have to run it on the lag interface?

like this ?
Code: [Select]
root@firewall:~ # sysctl -A | grep em3
irq267: em3:irq0:77 @cpu0(domain0): 76231

Title: Re: LACP is not working
Post by: Julien on December 04, 2020, 01:00:56 am
We have contacted the switch support team and the issue appear to be on the switch and the way it deal with the LACP.
They are aware of but not planning on changing anything in the near future as the switch is out of support.
Title: Re: LACP is not working
Post by: Julien on December 08, 2020, 12:19:56 am
I have configured the long timeout on the LAG interface but it not showing it on the interface.
anyway I can force that on the lag interface?

Code: [Select]
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=800008<VLAN_MTU>
        ether e8:39:35:11:fa:ab
        inet6 fe80::ea39:35ff:fe11:faab%lagg0 prefixlen 64 scopeid 0xb
        inet 192.168.73.1 netmask 0xffffff00 broadcast 192.168.73.255
        laggproto lacp lagghash l2,l3,l4
        laggport: em2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        groups: lagg
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>