Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - firewallfun

#1
High availability / Re: LAG issue
December 02, 2024, 08:58:41 PM
When I setup my 2nd OPNSense box, I just left it default to LACP fast disabled and that is also default on the switch. So it didn't make any improvements. I also tried later to move everything over to fast and restart the LACP-interface and the relevant ports on the switch.

I have only had FS-switches for 4 months (and upgraded to last version then). Replaced all our switches with them. But this is the first time I have had any issue with LACP actually. I mainly have lacp on all ports, against Supermicro-bladeservers and other switches/gears. And Rocky Linux/Windows-servers. It has been like a dream, until now.

It is 24/7 environment, so I can't risk rebooting them unless solid reason.

I'll research a bit more. For now at least it works in this active/backup-mode. It could also be bugs with network driver I guess (vs my spf+ intel ports).

#2
High availability / Re: LAG issue
December 02, 2024, 10:35:59 AM
I'm getting nowhere..

I deleted this LACP-lagg on the switch and changed the lagg0 on OPNsense from lacp to failover.

Didn't change the ports members ix2 or ix3 though. And it works just like one would expect. I set ix3 as master (the one that was issue in lacp team). And pinging lan without issue. When I do "ifconfig ix3 down", it goes over to ix2 after 4-5 missing pings (so not as fast as lacp would be). And back to ix3 afterwards. With no problem at all. But would have prefered lacp...

Since I have exact same issue with two physical OPNsense boxes, it must be something in software on OPNsense box or OS. Against the same switch switch lacp works against pfSense..
#3
High availability / Re: LAG issue
December 01, 2024, 09:36:56 PM
It must be some standard in lacp that is not matching here and that one of the port is just going into a sleep/backup-state where it doesn't exchange correct data.

ifconfig


lagg0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: LAN (lan)
        options=4e0382b<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
        ether 20:7c:14:f5:91:6f
        hwaddr 00:00:00:00:00:00
        inet .2 netmask 0xffffff00 broadcast ...255
        inet .1 netmask 0xffffff00 broadcast ...255 vhid 3
        laggproto lacp lagghash l2
        laggport: ix2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: ix3 flags=0<>
        groups: lagg
        carp: MASTER vhid 3 advbase 1 advskew 0
              peer 224.0.0.18 peer6 ff02::12
        media: Ethernet autoselect
        status: active


The port of issue is really up:

root@f1:~ # ifconfig ix3
ix3: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        options=4e0382b<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
        ether 20:7c:14:f5:91:6f
        hwaddr 20:7c:14:f5:91:70
        media: Ethernet autoselect (Unknown <rxpause,txpause>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>



TenGigabitEthernet 1/0/2                 up        1      Full     10G       fiber
TenGigabitEthernet 2/0/2                 up        1      Full     10G       fiber


root@f1:~ # tcpdump -i ix3 ether proto 0x8809
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ix3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:51:18.266702 LACPv1, length 110
21:51:19.351099 LACPv1, length 110
21:51:20.444063 LACPv1, length 110
21:51:21.547254 LACPv1, length 110
21:51:22.635720 LACPv1, length 110
21:51:23.725439 LACPv1, length 110
21:51:24.826664 LACPv1, length 110


ifconfig

NON-working OPNSense:

root@f1:~ # ifconfig lagg0
lagg0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: LAN (lan)
        options=4e0382b<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
        ether 20:7c:14:f5:91:6f
        hwaddr 00:00:00:00:00:00
        inet XXX.2 netmask 0xffffff00 broadcast XX255
        inet XXX.1 netmask 0xffffff00 broadcast XX.255 vhid 3
        laggproto lacp lagghash l2,l3,l4
        laggport: ix2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: ix3 flags=0<>
        groups: lagg
        carp: MASTER vhid 3 advbase 1 advskew 0
              peer 224.0.0.18 peer6 ff02::12
        media: Ethernet autoselect
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>


Working pfSense:

lagg0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: LAN
        options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
        ether 0c:c4:7a:aa:fb:a5
        hwaddr 00:00:00:00:00:00
        inet XXX.1 netmask 0xffffff00 broadcast XXXX
        inet6 fe80::ec4:7aff:feaa:fba5%lagg0 prefixlen 64 scopeid 0xa
        laggproto lacp lagghash l2,l3,l4
        laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        groups: lagg
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

#4
High availability / Re: LAG issue
December 01, 2024, 09:14:18 PM
I have disconnected the LACP-lag and now tested the LAN on the 2 individual ports that make up lacp 2. No problems, works perfectly one and one.

I have also double checked that the mac-address of the individual port in the switch vs the ones in opnsense is correct, so it is 100% sure it is physically connected.

As soon as joining the lacp-team, only one member of the team shows up correctly.

Tried to set everything to slow, both in my switch and on the lagg0.

Aggregate port 2:

Local information:
                                     LACP port       Oper    Port    Port
Port           Flags     State       Priority        Key     Number  State
---------------------------------------------------------------------------
Te1/0/2        SA        susp        32768           0x2     0x2     0x45
Te2/0/2        SA        bndl        32768           0x2     0x24    0x3d

Partner information:
                         LACP port                  Oper    Port     Port
Port           Flags     Priority      Dev ID       Key     Number   State
--------------------------------------------------------------------------
Te1/0/2        SP        0         0000.0000.0000   0x0     0x0      0x0
Te2/0/2        SA        32768     207c.14f5.916f   0x1d2   0x8      0x3d


During reboot, when OPNsense is down, it shows this status on the switch (correctly):

Aggregate port 2:

Local information:
                                     LACP port       Oper    Port    Port
Port           Flags     State       Priority        Key     Number  State
---------------------------------------------------------------------------
Te1/0/2        SA        susp        32768           0x2     0x2     0x45
Te2/0/2        SA        susp        32768           0x2     0x24    0x45

Partner information:
                         LACP port                  Oper    Port     Port
Port           Flags     Priority      Dev ID       Key     Number   State
--------------------------------------------------------------------------
Te1/0/2        SP        0         0000.0000.0000   0x0     0x0      0x0
Te2/0/2        SP        0         0000.0000.0000   0x0     0x0      0x0


#5
High availability / Re: LAG issue
December 01, 2024, 08:24:43 PM
And it clearly says in the switch that LACP is not enabled on one of the ports. So two set of cables, on two machines - and both have the exact same problem. It must be a bonding error in the lacp-setting in opnsense  (since it works on pfSense).

I have also disconnected the LACP-lag and no issues with the port member in question then, it worked just fine alone without LACP.

Both switch and the opnsense-box shows light/no light when I unplug/plug it into the port.

(5)Notifications
LACP
SUSPEND
Interface TenGigabitEthernet 1/0/2 suspended: LACP currently not enabled on the remote port.
2024-12-01 14:06:52

show lacp counters

Aggregate port 2:
Port          InPkts    OutPkts
-------------------------------
Te1/0/2        798391    1170027
Te2/0/2        945838    885832
#6
High availability / Re: LAG issue
December 01, 2024, 07:55:17 PM
And here is the detailed LACP-info for interface for LACP-members on the switch - I just picked on of the two failing firewalls as it fails the same way on both boxes:

FS#show running-config interface Te2/0/10

Building configuration...
Current configuration: 112 bytes

interface TenGigabitEthernet 2/0/10
description FW3
port-group 10 mode active
lacp short-timeout
FS#show running-config interface Te1/0/
FS#show running-config interface Te1/0/10

Building configuration...
Current configuration: 112 bytes

interface TenGigabitEthernet 1/0/10
description FW3
port-group 10 mode active
lacp short-timeout
#7
High availability / Re: LAG issue
December 01, 2024, 07:42:04 PM
Now back to topic - LAG issue:

While LAN now works, there are some issues as you see below. On OPNsense, it hasn't established bndl 100%.

I'm including a pfSense-box I also have in LACP lag (fast) that works 100%, that's the last lacp lagg shown in the list. It has the same config on the switch like the pfSense boxes.

If you look at the one unit of OVPNsense, it lists a blank Dev ID and even requesting Slow LACPDUs. But the one working has fast. There are no option to have both fast and slow on a lagg-pair, so I assume it is not actually requesting slow. At least no option to split them up.

Master FW

Aggregate port 10:
Local information:
                                     LACP port       Oper    Port    Port
Port           Flags     State       Priority        Key     Number  State
---------------------------------------------------------------------------
Te1/0/10       FA        bndl        32768           0xa     0xa     0x3f
Te2/0/10       FA        susp        32768           0xa     0x2c    0x47

Partner information:
                         LACP port                  Oper    Port     Port
Port           Flags     Priority      Dev ID       Key     Number   State
--------------------------------------------------------------------------
Te1/0/10       FA        32768     207c.14f5.9166   0x1b2   0x7      0x3f
Te2/0/10       SP        0         0000.0000.0000   0x0     0x0      0x0
FS#show lacp summary 2


Flags:  S - Device is requesting Slow LACPDUs   F - Device is requesting Fast LACPDUs.
A - Device is in active mode.        P - Device is in passive mode.

Backup FW

Aggregate port 2:

Local information:
                                     LACP port       Oper    Port    Port
Port           Flags     State       Priority        Key     Number  State
---------------------------------------------------------------------------
Te1/0/2        FA        susp        32768           0x2     0x2     0x47
Te2/0/2        FA        bndl        32768           0x2     0x24    0x3f

Partner information:
                         LACP port                  Oper    Port     Port
Port           Flags     Priority      Dev ID       Key     Number   State
--------------------------------------------------------------------------
Te1/0/2        SP        0         0000.0000.0000   0x0     0x0      0x0
Te2/0/2        FA        32768     207c.14f5.916f   0x1d2   0x8      0x3f
FS#show lacp summary 1

Flags:  S - Device is requesting Slow LACPDUs   F - Device is requesting Fast LACPDUs.
A - Device is in active mode.        P - Device is in passive mode.


pfSense (not OPNsense) unit I already have working, with same config on switch

Aggregate port 1:

Local information:
                                     LACP port       Oper    Port    Port
Port           Flags     State       Priority        Key     Number  State
---------------------------------------------------------------------------
Te1/0/1        FA        bndl        32768           0x1     0x1     0x3f
Te2/0/18       FA        bndl        32768           0x1     0x34    0x3f

Partner information:
                         LACP port                  Oper    Port     Port
Port           Flags     Priority      Dev ID       Key     Number   State
--------------------------------------------------------------------------
Te1/0/1        FA        32768     0cc4.7aaa.fba5   0x14b   0x2      0x3f
Te2/0/18       FA        32768     0cc4.7aaa.fba5   0x14b   0x4      0x3f
#8
High availability / Re: CARP on WAN behaving weirdly...
December 01, 2024, 01:54:29 AM
I had a fiber switch/edgeswitch laying around here and all worked from same second I put that on WAN-side, so you were right. Now the HA-works perfectly.

Also have another one I don't use, so I can split it up later, but now I can have fun! VLANs sounds a bit complicated, so was easier to get up and running this way.

Thank you for all help and suggestion with this, it is appreciated!  :)

#9
High availability / Re: LAG issue
November 30, 2024, 10:41:29 PM
Thank you for your suggestion. I have a thread here on it: https://forum.opnsense.org/index.php?topic=44226.0

It seems to be that since I have a public /29 IP on my WAN on both devices and my ISP has routers that disable/enables each fiber at their end (participating in the /29), I can't do it like this. It is not a flat /29. Need to buy 2 new switches on the WAN-side, so each WAN-interface sees each other before I can connect my to OPNsense to the shared WAN-network.
#10
High availability / Re: CARP on WAN behaving weirdly...
November 30, 2024, 10:34:50 PM
This is partly how I have it on the LAN-side of OPNsense. I have LACP-lag from both OPNsense's to the stacked switches, in total 4 spf+ cables between the switch-pair and the OPNsense lagg-pair. If I take out one switch or one OPNSense, it will not affect the network on the LAN-side. My switch doesn't support MLAG I think (Multi chassis) but I guess it doesn't matter in this case as it uses stacking, only 2 units stackable and LACP only.

https://www.fs.com/de-en/products/108710.html

I assume you would then re-use the current cables in your suggestion, so both LAN and WAN (and everything else) goes over same cables, only with VLANS separating it? That sounds effective.
#11
High availability / Re: CARP on WAN behaving weirdly...
November 30, 2024, 09:16:43 PM
Quote from: Patrick M. Hausen on November 30, 2024, 08:35:38 PM
Now what is the reasoning? Simple, why should there be a dedicated link simply to keep a virtual IP address active on one of two or more nodes?

Well, to save the planet and save power, have less network gear :) I bet there are more than me that have redundant lines from their ISP and having IP-ranges assigned to the VIP. But I guess that is mostly in the enterprise world and they don't care about these things, they just buy whatever needed and happy with that.

I do understand that in a standard, you can't go out and do other things necessarily. I'm saying this in case there are other features in OPNSense that could provide this (that I don't know of) :) On a logical level, if we look away from the limitations given by the standard, I do not understand that it isn't possible to do Active/Backup fw feature. For instance, I have a VPN firewall that pings a given GW IP. If that GW stops responding, it will activate a different WAN-network. Shouldn't be hard to program a script to do something similar, that for instance deactivates WAN totally if the returned (if any) mac-address or host-name responds to a arp, ping or other type of request in a certain expected or unexpected way. The script could continue to ping, from LAN (or other internal function), to the WAN CARP IP or the upstream GW. And activate WAN again if no ping/response.

I got reply on the other thing you asked me earlier also, basically confirming what you have said:
"Ask your ISP if on the other side of these two links there is a switch that allows the
two OPNsense boxes to communicate with each other or if you are supposed to provide your
own."

Their answer:
"This is not a viable option. While it can technically be done, it would mean that we can no longer guarantee availability. For example, one of the lines between us and you could go down without our HSRP setup detecting it, resulting in us transporting your traffic between our routers because you lack Layer 2 on your side. In the current setup, we configure the HSRP endpoints on our side to automatically withdraw internal routes in our network if the port goes down."
#12
High availability / Re: CARP on WAN behaving weirdly...
November 30, 2024, 07:56:03 PM
Ignore the sync-thing here below, you have explained to me that it doesn't involve the VIP, so it hasn't anything to do with the WAN-network as such. But here is what ISP said:

"CARP works such that you have two IP interfaces that continuously communicate with each other to check if the other side is present, so there must be a physical Layer 2 Ethernet network between the two boxes. I don't know what the sync link is and what it's used for, but if it's a regular Ethernet connection between the boxes and the boxes communicate Ethernet over it, it should work with that cable. "

They recommend two unmanaged switches to create that flat network as you say - since they have routers on their side that senses what line is active etc. But they are not familiar with Opnsense and how it works. Isn't there any way to create a way for both boxes to see which one is active? Like if I create a direct connection on another port and bridge the two fw and the ports.. maybe not possible, just wanted a last try to rescue me from switch-solution :) I think I will just go for two new switches on WAN.

If I was a programmer, I would think it was easy to constantly ping my CARP-IP. If it is is not active (no ping reply), then make this backup-fw primary and make the CARP-IP active. Until I hear from the master via the sync-interface or a seperate line, then deactivate it. Why is it so hard to do :) Why not communicate signals like this on a seperate cable or just use the sync-interface as it is already in constant internal traffic. I don't get it. I wouldn't mind if it took a minute instead of seconds even, in case it needed to be sure it is really down.
#13
High availability / Re: CARP on WAN behaving weirdly...
November 30, 2024, 12:17:21 AM
Ah, I see. Is it as simple as this if I choose to use the VLAN-method on my stacked LAN-switches? From Chat GPT  ;D

Ports for ISP Lines:

Connect one ISP line to port 24 on Switch 1.
Connect the second ISP line to port 24 on Switch 2.

Ports for Firewall WAN Interfaces:

Connect port 25 on Switch 1 to Firewall 1's WAN interface.
Connect port 25 on Switch 2 to Firewall 2's WAN interface.

VLAN Configuration:

Create a dedicated VLAN for your WAN traffic (e.g., VLAN 10).
Assign ports 24 and 25 on both switches to VLAN 10.
Configure these ports as untagged (access mode) for VLAN 10 since ISP lines typically do not tag traffic.

Stacked Switch Behavior:

Ensure the switches are correctly stacked and function as a single logical unit, so VLAN 10 spans both switches seamlessly.

In this case, I can basically run the setup and IPs I already have.
#14
High availability / Re: CARP on WAN behaving weirdly...
November 29, 2024, 10:57:09 PM
I also found this in my email, when I asked them for some details (anonymized the IP using chat gpt):

"Link network: 203.0.120.184/29
We use 203.0.120.185 (HSRP), 203.0.120.186 (e01), and 203.0.120.187 (e02).
You should use 203.0.120.188 (HSRP/VRRP or equivalent), 203.0.120.189, and 203.0.120.190.

We route 198.51.101.0/24 behind 203.0.120.188 with tracking of interface line protocol, meaning the route will only be active internally and in BGP if the port to you is up. The network will therefore not be visible on the internet until you have connected the links."
#15
High availability / Re: CARP on WAN behaving weirdly...
November 29, 2024, 10:46:05 PM
I know they already do BGP for me. They asked me if I wanted to set it up myself or if they should take care of it.  Since I have a /24 I have bought and they route it for me somehow. But maybe in different context than you talk about, not sure. It's greek to me :)