LAGG flapping at regular time intervals

Started by kei, March 25, 2024, 09:49:41 AM

Previous topic - Next topic
Quote from: Patrick M. Hausen on June 12, 2026, 09:59:00 PMupdate - I can confirm there might be an interoperability problem with Mikrotik devices and FreeBSD/OPNsense concerning LACP.

Since I changed the LACP timeout from 1s/fast to 30s/slow on both sides and disabled "strict" mode on the OPNsense side the connection now seems to be stable.
On the Mikrotik side there is nothing to adjust but the timeout. I also disabled flowid explicitly on OPNsense but if I am not mistaken that was the default all the time, anyway.
I think I can confirm this :
Quote from: Seimus on June 12, 2026, 11:05:43 PMYea the fast timeout cross vendor is always problematic.
This is not only applying for FBSD & Mikrotik, but as well other vendors.
When I use to build a lot of Linux Bonding LACP links we use to set this :
Quotebond-lacp-rate rate
Denotes the rate of LACPDU requested from the peer.
The rate can be given as string or as numerical value.

Valid values are slow (0) and fast (1). The default is slow.
To SLOW too :)

And this :
Quotebond-miimon interval
Denotes the MII link monitoring frequency in milliseconds.
This determines how often the link state of each slave is inspected for link failures.

A value of zero disables MII link monitoring. The default is 0.
Was always set at 100 IIRC...

Source : https://manpages.ubuntu.com/manpages/jammy/man5/interfaces-bond.5.html
Weird guy who likes everything Linux and *BSD on PC/Laptop/Tablet/Mobile and funny little ARM based boards :)

Fast timeout is pain in Enterprise too.
At one point I had enough and enforced across company to use timeout slow (30s) for cross vendor connections.

Because the fast was constantly causing for example FW switchovers and other nonsense....

And thats the reason its in OPNsense docs too cause I was crying to Cedrik when he was writing it :)
https://github.com/opnsense/docs/pull/610#issuecomment-2424144823

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
N355 - i226-V | AQC113C | 16G | 500G - PROD

PRXMX
N5105 - i226-V | 2x8G | 512G - NODE #1
N100 - i226-V | 16G | 1T - NODE #2

No more flapping during the night and half a day. So it seems the conservative approach is to use slow timeouts.

I vaguely remember reconfiguring all my LAGG ports to use the same settings across all devices a couple of months ago. Probably I changed OPNsense-switch to fast at that time.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I am not surprised you configured Fast timeout.
The 1s re-convergence vs 30s is a BIG deal.

But if its not working as should, it causes more troubles, cause in worst case scenario it can cause insane micro-flaps.
I have seen outage windows for 5-15min with Fast timeout...

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
N355 - i226-V | AQC113C | 16G | 500G - PROD

PRXMX
N5105 - i226-V | 2x8G | 512G - NODE #1
N100 - i226-V | 16G | 1T - NODE #2

Today at 02:35:07 PM #19 Last Edit: Today at 02:48:48 PM by Patrick M. Hausen
This home lab would of course work just as well with just one link per system. I only use LAGG for all core infrastructure because I can - and to gain experience with such setups.

In the production DC we have MLAG to catch the complete loss of a single switch. I *think* we use 30s - which is good enough for hosted web applications, IMHO. STP convergence is in the same time range. Flapping of course is a different beast altogether and somehow even worse than a complete loss of connectivity.

I could not find any documentation on the "strict" option, though. So I turned to the tried and true method of "use the source, Luke".

In net/if_lagg.c, around line 1538 we find:

struct lacp_softc *lsc;
struct lacp_port *lp;

lsc = (struct lacp_softc *)sc->sc_psc;

switch (ro->ro_opts) {
[...]
case LAGG_OPT_LACP_STRICT:
lsc->lsc_strict_mode = 1;
break;
case -LAGG_OPT_LACP_STRICT:
lsc->lsc_strict_mode = 0;
break;

In net//ieee8023ad_lacp.c we have a per LCAP partner bit mask that more or less defines which variables we accept from the partner on reception or not:

/*
 * partner administration variables.
 * XXX should be configurable.
 */

static const struct lacp_peerinfo lacp_partner_admin_optimistic = {
.lip_systemid = { .lsi_prio = 0xffff },
.lip_portid = { .lpi_prio = 0xffff },
.lip_state = LACP_STATE_SYNC | LACP_STATE_AGGREGATION |
    LACP_STATE_COLLECTING | LACP_STATE_DISTRIBUTING,
};

static const struct lacp_peerinfo lacp_partner_admin_strict = {
.lip_systemid = { .lsi_prio = 0xffff },
.lip_portid = { .lpi_prio = 0xffff },
.lip_state = 0,
};
[...]
if (lp->lp_lsc->lsc_strict_mode)
lp->lp_partner = lacp_partner_admin_strict;
else
lp->lp_partner = lacp_partner_admin_optimistic;

The only actual code path using that mechanism is in lines 1732 ff. and 1812 ff.

/*
* XXX Maintain legacy behavior of leaving the
* LACP_STATE_SYNC bit unchanged from the partner's
* advertisement if lsc_strict_mode is false.
* TODO: We should re-examine the concept of the "strict mode"
* to ensure it makes sense to maintain a non-strict mode.
*/
if (lp->lp_lsc->lsc_strict_mode)
lp->lp_partner.lip_state |= LACP_STATE_SYNC;
[...]
static void
lacp_sm_rx_update_default_selected(struct lacp_port *lp)
{

LACP_TRACE(lp);

if (lp->lp_lsc->lsc_strict_mode)
lacp_sm_rx_update_selected_from_peerinfo(lp,
    &lacp_partner_admin_strict);
else
lacp_sm_rx_update_selected_from_peerinfo(lp,
    &lacp_partner_admin_optimistic);
}

So essentially strict mode clears some of the information received by the partner because these flags are (supposedly) not part of the 802.3ad standard. Looks like more or less a no-op to me. See the "XXX" comment above.

I'll re-enable it and whatch what happens.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on Today at 02:35:07 PMIn the production DC we have MLAG to catch the complete loss of a single switch. I *think* we use 30s - which is good enough for hosted web applications, IMHO. STP convergence is in the same time range. Flapping of course is a different beast altogether and somehow even worse than a complete loss of connectivity.

Yea MEC, MLAG or VPC are the ultimate form deployments you want in Production.
Flapping on a LAGG is always a story on it self. And you are right on that, this is causing even worse problems than if it would just die.

Quote from: Patrick M. Hausen on Today at 02:35:07 PMSo essentially strict mode clears some of the information received by the partner because these fields are (supposedly) not part of the 802.3ad standard. Looks like more or less a no-op to me. See the "XXX" comment above.

I'll re-enable it and whatch what happens.

I always thought that the Strict mode enforces the usage of LACP within the LAGG. Meaning if both sides are not actively talking proper LACP the LAGG will not establish....

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
N355 - i226-V | AQC113C | 16G | 500G - PROD

PRXMX
N5105 - i226-V | 2x8G | 512G - NODE #1
N100 - i226-V | 16G | 1T - NODE #2

Quote from: Seimus on Today at 02:55:33 PMMeaning if both sides are not actively talking proper LACP the LAGG will not establish....

Check the source please - possibly I am reading it wrong. I have a fair knowledge of C but no experience with these parts of the kernel code. All in all I am stuck in the 70s, Lions' Commentary and of course Minix ;-)
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)