Hello opnsense community,
i have a 2 node opnsense HA setup with 10 virtual IPs (CARP).
Node 1 is the Master, node 2 the Backup node in the HA setup.
When node 1 (MAster) is switched off , all virtual IPs are moved to the backup node, works just fine.
When node 1 is switched on again and becoming master, not all of the virtual IPs are moving over to the node 1, sometimes 2 of them stay on the former master, now backup, node 2. Could this be due to still open connections still on node 2?
Is there any way to kinda debug this? Anyone know this behavior and could point me in the direction where to search for the root cause?
I have the same situation occuring randomly.
Unfortunately my post didn't get any answer. If you search well, you will see many other similar posts in the forum. All unanswered :(
It seems that nobody is able to explain this. But you're not alone, many people are reporting the same thing.
I guess opnsense is not as reliable as I thought...
Screenshot of HA settings of both nodes please
Meanwhile the problem has been solved.
Turns out it was a mistake on my config, the adskew settings were not properly set.
Besides, i found a lot of blofposts and community posts regardinf the topic Failover with CARP, VLAN and LLAG.
Maybe this is woth investigating if you still have problems with virtual IP and failover.
@Sonderbar: advskew? Interesting.
What are the correct values in your setup ?
I have 0 on master server and 100 on backup server.
What are your values ?
Thanks for the advice I'll dig again "Failover with CARP, VLAN", but I don't use LAGG.
I had 1 on BOTH nodes. now i iave 1 on master and 101 on backup for ll of my Virtual IPs and all works well. in my case this was the root cause.
i use Failver with virtual IPs, CARP, VLANs and LAGG, so i did think this was the issue. turned out it wasn't. lucky me, because there s no real solutions as it seems if you have problems with LAGG and CARP.
Quote from: mimugmail on February 25, 2024, 05:46:22 PM
Screenshot of HA settings of both nodes please
On BACKUP my HA settings is:
Disable preempt : no
disconect dialup interface : no
synchronize states : checked
Synchronize interface : PFSync
Synchronize Peer IP : 10.0.0.1
Synchronize Config to IP : none
remote system username : none
remote system password none
Services to synchronize : : dashboard, DHCPD, Virtual IPs, firewall rules, aliases, NAT
----------------
On MASTER server my HA settings is:
Disable preempt : no
disconect dialup interface : no
synchronize states : checked
Synchronize interface : PFSync
Synchronize Peer IP : 10.0.0.2
Synchronize Config to IP : 10.0.0.2
remote system username : set
remote system password set
Services to synchronize : : dashboard, DHCPD, Virtual IPs, firewall rules, aliases, NAT
---------------------------------
VIP settings on master:
192.168.10.1/24 10 (freq. 1/1) SRV CARP Virtual SRV IP
192.168.20.1/24 20 (freq. 1/1) IoT CARP Virtual IoT IP
192.168.40.1/24 40 (freq. 1/1) Management CARP Virtual Management IP
192.168.30.1/24 30 (freq. 1/1) GUEST CARP Virtual Guest IP
192.168.42.250/24 1 (freq. 1/1) WAN CARP Virtual WAN IP
VIP settings on backup:
192.168.10.1/24 10 (freq. 1/101) SRV CARP Virtual SRV IP
192.168.20.1/24 20 (freq. 1/101) IoT CARP Virtual IoT IP
192.168.40.1/24 40 (freq. 1/101) Management CARP Virtual Management IP
192.168.30.1/24 30 (freq. 1/101) GUEST CARP Virtual Guest IP
192.168.42.250/24 1 (freq. 1/101) WAN CARP Virtual WAN IP
XMLRPC Sync is working fine.
For unknown reason, I get quite ofter (2 to 4 times a week) CARP errors. In this case Backup server become master.
But randomly each server become master on some IPs and backup on others.
On those servers 10.0.0.1 and 10.0.0.2 (pflink) is on a dedicated vlan.
I use dhcpv4, carp, monit. Nothing else (no vpn, no LAGG)
You Master is 10.0.0.2 and Backup 10.0.0.1? You sync from Backup to Master? Looks weird...
Sorry, my mistake master/slave inversion. I did the correction in the previous post.
Of course sync is master (10.0.0.1) -> backup (10.0.0.2).
PFSync work fine. I always edit master conf and manualy run the sync. All settings are correctly replicated. No problem with sync.
As described the only problem is with carp VIP not all getting the same status.
Sorry for the typo.
Output of dmesg -a please
This is the master server. Would you like also the backup ?
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.2-RELEASE-p10 stable/24.1-n254984-f7b006edfa8 SMP amd64
FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
VT(vga): resolution 640x480
CPU: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz (2000.15-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x30678 Family=0x6 Model=0x37 Stepping=8
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x41d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,RDRAND>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x101<LAHF,Prefetch>
Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS,NFPUSG>
VT-x: (disabled in BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics
real memory = 4294967296 (4096 MB)
avail memory = 3992117248 (3807 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I >
WARNING: L1 data cache covers fewer APIC IDs than a core (0 < 1)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 128/32 (20201113/tbfadt-748)
ioapic0 <Version 2.0> irqs 0-86
Launching APs: 3 2 1
wlan: mac acl policy registered
random: entropy device external interface
kbd1 at kbdmux0
WARNING: Device "spkr" is Giant locked and may be deleted before FreeBSD 14.0.
vtvga0: <VT VGA driver>
smbios0: <System Management BIOS> at iomem 0xf04d0-0xf04ee
smbios0: Version: 2.8, BCD Revision: 2.7
aesni0: No AES or SHA support.
acpi0: <ALASKA A M I >
acpi0: Power Button (fixed)
unknown: I/O range not supported
cpu0: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x77 on acpi0
atrtc0: Warning: Couldn't map I/O.
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 8 on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 450
Event timer "HPET1" frequency 14318180 Hz quality 440
Event timer "HPET2" frequency 14318180 Hz quality 440
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0xf080-0xf087 mem 0xd0000000-0xd03fffff,0xc0000000-0xcfffffff irq 16 at device 2.0 on pci0
vgapci0: Boot video device
ahci0: <AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xd0911000-0xd09117ff irq 19 at device 19.0 on pci0
ahci0: AHCI v1.30 with 2 3Gbps ports, Port Multiplier not supported
ahcich1: <AHCI channel> at channel 1 on ahci0
xhci0: <Intel BayTrail USB 3.0 controller> mem 0xd0900000-0xd090ffff irq 20 at device 20.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
xhci0: Port routing mask set to 0xffffffff
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
pci0: <encrypt/decrypt> at device 26.0 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> irq 17 at device 28.1 on pci0
pci2: <ACPI PCI bus> on pcib2
pci2: <network> at device 0.0 (no driver attached)
pcib3: <ACPI PCI-PCI bridge> irq 18 at device 28.2 on pci0
pci3: <ACPI PCI bus> on pcib3
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xd0704000-0xd0704fff,0xd0700000-0xd0703fff irq 18 at device 0.0 on pci3
re0: Using 1 MSI-X message
re0: ASPM disabled
re0: Chip rev. 0x2c800000
re0: MAC rev. 0x00100000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re0: Using defaults for TSO: 65518/35/2048
re0: Ethernet address: 00:e0:b4:18:38:fa
re0: netmap queues/slots: TX 1/256, RX 1/256
pcib4: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0
pci4: <ACPI PCI bus> on pcib4
re1: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xd000-0xd0ff mem 0xd0604000-0xd0604fff,0xd0600000-0xd0603fff irq 19 at device 0.0 on pci4
re1: Using 1 MSI-X message
re1: ASPM disabled
re1: Chip rev. 0x2c800000
re1: MAC rev. 0x00100000
miibus1: <MII bus> on re1
rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus1
rgephy1: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re1: Using defaults for TSO: 65518/35/2048
re1: Ethernet address: 00:e0:b4:18:38:fb
re1: netmap queues/slots: TX 1/256, RX 1/256
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
acpi_button0: <Power Button> on acpi0
acpi_button1: <Sleep Button> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
est0: <Enhanced SpeedStep Frequency Control> on cpu0
Timecounter "TSC" frequency 1999999565 Hz quality 1000
Timecounters tick every 1.000 msec
ada0 at ahcich1 bus 0 scbus0 target 0 lun 0
ada0: <SanDisk SSD i110 32GB i212000> ACS-2 ATA SATA 3.x device
ada0: Serial Number 150800165321
ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 30533MB (62533296 512 byte sectors)
Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
ugen0.1: <Intel XHCI root HUB> at usbus0
uhub0 on usbus0
uhub0: <Intel XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
Mounting filesystems...
tunefs: soft updates remains unchanged as enabled
tunefs: issue TRIM to the disk remains unchanged as enabled
** /dev/gpt/rootfs
FILE SYSTEM CLEAN; SKIPPING CHECKS
clean, 6360321 free (7185 frags, 794142 blocks, 0.1% fragmentation)
uhub0: 7 ports with 7 removable, self powered
Setting hostuuid: ade0dae8-1876-4564-b093-52a37f1e0b77.
Setting hostid: 0x44c6914b.
Configuring vt: keymap blanktime.
Configuring crash dump device: /dev/null
.ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg /usr/local/lib/ipsec /usr/local/lib/perl5/5.36/mach/CORE
32-bit compatibility ldconfig path:
done.
>>> Invoking early script 'upgrade'
>>> Invoking early script 'configd'
Starting configd.
>>> Invoking early script 'templates'
Generating configuration: ERR
>>> Invoking early script 'backup'
>>> Invoking backup script 'captiveportal'
>>> Invoking backup script 'dhcpleases'
>>> Invoking backup script 'duid'
>>> Invoking backup script 'netflow'
>>> Invoking backup script 'rrd'
>>> Invoking early script 'carp'
CARP event system: OK
Launching the init system...done.
Initializing...........done.
re0: link state changed to UP
re1: link state changed to UP
Starting device manager...
bwn_pci0: <Broadcom BCM43224 802.11n Dual-Band Wireless> mem 0xd0800000-0xd0803fff irq 17 at device 0.0 on pci2
bhndb0: <PCI-BHND bridge> on bwn_pci0
bhndb0: Using MSI interrupts on bwn_pci0
bhnd0: <BCM43224 BCMA bus> on bhndb0
bhnd_chipc0: <Broadcom ChipCommon I/O Controller, rev 34> mem 0x18000000-0x18000fff,0x18100000-0x18100fff irq 0 at core 0 on bhnd0
bhnd_nvram0: <SPROM/OTP> mem 0x18000800-0x18000bff on bhnd_chipc0
bhnd_pmu0: <Broadcom ChipCommon PMU, rev 6> on bhnd_chipc0
gpio0: <Broadcom ChipCommon GPIO> mem 0x18000000-0x18000fff on bhnd_chipc0
bhnd_hostb0: <Broadcom PCIe-G1 Host-PCI bridge, rev 15> mem 0x18002000-0x18002fff,0x8000000-0xfffffff,0x8000000000000000-0xffffffffffffffff,0x18102000-0x18102fff,0x18103000-0x18103fff irq 2 at core 2 on bhnd0
bwn0: <Broadcom 802.11 MAC/PHY/Radio, rev 23> mem 0x18001000-0x18001fff,0x18101000-0x18101fff irq 1 at core 1 on bhnd0
bwn0: bwn_phy_n_attach: BWN_GPL_PHY not in kernel config; no PHY-N support
bwn0: failed
device_attach: bwn0 attach returned 6
bwn0: <Broadcom 802.11 MAC/PHY/Radio, rev 23> mem 0x18001000-0x18001fff,0x18101000-0x18101fff irq 1 at core 1 on bhnd0
bwn0: bwn_phy_n_attach: BWN_GPL_PHY not in kernel config; no PHY-N support
bwn0: failed
device_attach: bwn0 attach returned 6
bwn0: <Broadcom 802.11 MAC/PHY/Radio, rev 23> mem 0x18001000-0x18001fff,0x18101000-0x18101fff irq 1 at core 1 on bhnd0
bwn0: bwn_phy_n_attach: BWN_GPL_PHY not in kernel config; no PHY-N support
bwn0: failed
device_attach: bwn0 attach returned 6
bwn0: <Broadcom 802.11 MAC/PHY/Radio, rev 23> mem 0x18001000-0x18001fff,0x18101000-0x18101fff irq 1 at core 1 on bhnd0
bwn0: bwn_phy_n_attach: BWN_GPL_PHY not in kernel config; no PHY-N support
bwn0: failed
device_attach: bwn0 attach returned 6
bwn0: <Broadcom 802.11 MAC/PHY/Radio, rev 23> mem 0x18001000-0x18001fff,0x18101000-0x18101fff irq 1 at core 1 on bhnd0
bwn0: bwn_phy_n_attach: BWN_GPL_PHY not in kernel config; no PHY-N support
bwn0: failed
device_attach: bwn0 attach returned 6
bwn0: <Broadcom 802.11 MAC/PHY/Radio, rev 23> mem 0x18001000-0x18001fff,0x18101000-0x18101fff irq 1 at core 1 on bhnd0
bwn0: bwn_phy_n_attach: BWN_GPL_PHY not in kernel config; no PHY-N support
bwn0: failed
device_attach: bwn0 attach returned 6
ichsmb0: <Intel Baytrail SMBus controller> port 0xf000-0xf01f mem 0xd0910000-0xd091001f irq 18 at device 31.3 on pci0
smbus0: <System Management Bus> on ichsmb0
done.
Configuring login behaviour...done.
Configuring loopback interface...
lo0: link state changed to UP
done.
Configuring kernel modules...done.
Setting up extended sysctls...done.
Setting timezone: Indian/Reunion
Writing firmware settings: FreeBSD OPNsense
Writing trust files...done.
Scanning /usr/share/certs/blacklisted for certificates...
Scanning /usr/share/certs/trusted for certificates...
Scanning /usr/local/share/certs for certificates...
Writing trust bundles...done.
Setting hostname: fw-main.mydomain.com
Generating /etc/resolv.conf...done.
Generating /etc/hosts...done.
Configuring system logging...done.
Configuring firewall.......
pflog0: permanently promiscuous mode enabled
done.
Configuring hardware interfaces...done.
Configuring loopback interface...done.
Configuring LAGG interfaces...done.
Configuring VLAN interfaces...
re0: link state changed to DOWN
vlan0: changing name to 'vlan0.10'
vlan1: changing name to 'vlan0.20'
vlan2: changing name to 'vlan0.30'
vlan3: changing name to 'vlan0.40'
vlan4: changing name to 'vlan0.90'
done.
Configuring GUEST interface...
re0: promiscuous mode enabled
vlan0.30: promiscuous mode enabled
carp: demoted by 240 to 240 (interface down)
carp: demoted by 240 to 480 (pfsync bulk start)
done.
Configuring IoT interface...
vlan0.20: promiscuous mode enabled
carp: demoted by 240 to 720 (interface down)
done.
Configuring Management interface...
vlan0.40: promiscuous mode enabled
carp: demoted by 240 to 960 (interface down)
done.
Configuring PFSync interface...done.
Configuring SRV interface...
[fib_algo] inet.0 (bsearch4#16) rebuild_fd_flm: switching algo to radix4_lockless
vlan0.10: promiscuous mode enabled
carp: demoted by 240 to 1200 (interface down)
done.
Configuring WAN interface...
re1: link state changed to DOWN
re1: promiscuous mode enabled
carp: demoted by 240 to 1440 (interface down)
done.
Generating /etc/resolv.conf...done.
Generating /etc/hosts...done.
Configuring firewall.......done.
Starting web GUI...done.
Setting up routes...done.
Starting DHCPv4 service...done.
Configuring firewall.
re0: link state changed to UP
carp: 20@vlan0.20: INIT -> BACKUP (initialization complete)
carp: demoted by -240 to 1200 (interface up)
vlan0.20: link state changed to UP
carp: 40@vlan0.40: INIT -> BACKUP (initialization complete)
carp: demoted by -240 to 960 (interface up)
vlan0.40: link state changed to UP
carp: 10@vlan0.10: INIT -> BACKUP (initialization complete)
carp: demoted by -240 to 720 (interface up)
vlan0.10: link state changed to UP
vlan0.90: link state changed to UP
carp: 30@vlan0.30: INIT -> BACKUP (initialization complete)
carp: demoted by -240 to 480 (interface up)
vlan0.30: link state changed to UP
......done.
Setting up gateway monitors...
carp: 1@re1: INIT -> BACKUP (initialization complete)
carp: demoted by -240 to 240 (interface up)
re1: link state changed to UP
done.
Syncing OpenVPN settings...done.
Starting NTP service...done.
Generating RRD graphs...done.
>>> Invoking start script 'newwanip'
>>> Invoking start script 'freebsd'
carp: 20@vlan0.20: BACKUP -> INIT (hardware interface up)
vlan0.20: promiscuous mode disabled
Starting monit.
vlan0.20: promiscuous mode enabled
carp: 20@vlan0.20: INIT -> BACKUP (initialization complete)
Starting Monit 5.33.0 daemon with http interface at /var/run/monit.sock
>>> Invoking start script 'syslog'
>>> Invoking start script 'carp'
carp: demoted by -240 to 0 (pfsync bulk done)
carp: 30@vlan0.30: BACKUP -> MASTER (master timed out)
>>> Invoking start script 'cron'
Starting Cron:
carp: 10@vlan0.10: BACKUP -> MASTER (preempting a slower master)
carp: 40@vlan0.40: BACKUP -> MASTER (preempting a slower master)
carp: 20@vlan0.20: BACKUP -> MASTER (preempting a slower master)
OK
carp: 1@re1: BACKUP -> MASTER (preempting a slower master)
>>> Invoking start script 'openvpn'
>>> Invoking start script 'sysctl'
carp: 40@vlan0.40: MASTER -> INIT (hardware interface up)
vlan0.40: promiscuous mode disabled
vlan0.40: promiscuous mode enabled
carp: 40@vlan0.40: INIT -> BACKUP (initialization complete)
carp: demoted by 240 to 240 (pfsync bulk start)
carp: demoted by -240 to 0 (pfsync bulk done)
Service `sysctl' has been restarted.
>>> Invoking start script 'beep'
Root file system: /dev/gpt/rootfs
Sat Feb 24 21:03:20 +04 2024
*** fw-main.mydomain.com: OPNsense 24.1.2_1 ***
GUEST (vlan0.30) -> v4: 192.168.30.254/24
IoT (vlan0.20) -> v4: 192.168.20.254/24
Management (vlan0.40) -> v4: 192.168.40.254/24
PFSync (vlan0.90) -> v4: 10.0.0.1/24
SRV (vlan0.10) -> v4: 192.168.10.254/24
WAN (re1) -> v4: 192.168.42.254/24
HTTPS: SHA256 57 39 D5 86 3A 7A C1 71 F0 9C 31 BF 71 8A BA D1
2C 1A 47 5A DD AE 7D 31 C6 CD 99 52 2E D5 15 E3
carp: 10@vlan0.10: MASTER -> INIT (hardware interface up)
vlan0.10: promiscuous mode disabled
vlan0.10: promiscuous mode enabled
carp: 40@vlan0.40: BACKUP -> MASTER (preempting a slower master)
carp: 10@vlan0.10: INIT -> BACKUP (initialization complete)
carp: demoted by 240 to 240 (pfsync bulk start)
carp: demoted by -240 to 0 (pfsync bulk done)
carp: 10@vlan0.10: BACKUP -> MASTER (preempting a slower master)
carp: 30@vlan0.30: MASTER -> INIT (hardware interface up)
vlan0.30: promiscuous mode disabled
vlan0.30: promiscuous mode enabled
carp: 30@vlan0.30: INIT -> BACKUP (initialization complete)
carp: demoted by 240 to 240 (pfsync bulk start)
carp: demoted by -240 to 0 (pfsync bulk done)
carp: 1@re1: MASTER -> INIT (hardware interface up)
re1: promiscuous mode disabled
re1: promiscuous mode enabled
carp: 1@re1: INIT -> BACKUP (initialization complete)
carp: demoted by 240 to 240 (pfsync bulk start)
carp: demoted by -240 to 0 (pfsync bulk done)
carp: 30@vlan0.30: BACKUP -> MASTER (master timed out)
carp: 1@re1: BACKUP -> MASTER (preempting a slower master)
carp: 10@vlan0.10: MASTER -> INIT (hardware interface up)
carp: 10@vlan0.10: INIT -> BACKUP (initialization complete)
carp: 20@vlan0.20: MASTER -> INIT (hardware interface up)
carp: 20@vlan0.20: INIT -> BACKUP (initialization complete)
carp: 40@vlan0.40: MASTER -> INIT (hardware interface up)
carp: 40@vlan0.40: INIT -> BACKUP (initialization complete)
carp: 30@vlan0.30: MASTER -> INIT (hardware interface up)
carp: 30@vlan0.30: INIT -> BACKUP (initialization complete)
carp: 1@re1: MASTER -> INIT (hardware interface up)
carp: 1@re1: INIT -> BACKUP (initialization complete)
carp: 10@vlan0.10: BACKUP -> MASTER (preempting a slower master)
carp: 40@vlan0.40: BACKUP -> MASTER (preempting a slower master)
carp: 30@vlan0.30: BACKUP -> MASTER (preempting a slower master)
carp: 20@vlan0.20: BACKUP -> MASTER (preempting a slower master)
carp: 1@re1: BACKUP -> MASTER (preempting a slower master)
There are too many interface resets, can you disable spanning tree portfast on the port so it comes up faster? Maybe also think about replacing realtek nic's, they behave really sloppy.
OK I will try to disable "spanning tree portfast".
Can you please tell me where can I find this setting? is it on the server bios, on opnsense or on the switch connected to the port?
Thank you
Quote from: mimugmail on February 28, 2024, 02:53:04 PM
disable spanning tree portfast on the port so it comes up faster?
Huh?
Without spanning-tree portfast the port takes roughly 30 seconds to come up.
With spanning-tree port fast it comes up faster, because the switch will not check for a bridge and STP at the other end.
1- Can you please point me to my dmesg lines showing the excessive interface reset? I guess it's the last part of the log, but I'm not sure.
2 - Is "FastLink", the Netgear name for Portfast ?
On my main LAN switch connected to the firewall I have:
Setting : Switching-> STP -> CST Port Configuration, this is what I get on all ports:
STP status: enable
fast link : disable <-----------***
BPDU forwarding : enable
audo edge : enable
port state: forwarding
path cost: 20000
priority : 128
external port path cost: 20000
***Is "fast link" the setting you were talking about? It is disabled by default on all ports.
Should I enable it on the LAN port of my firewall? I think it will pass the port in forward mode and disable STP.
The other ethernet port of my firewall is on my ISP fiber box.
Thank you:)
Please enable it :)
I did enable it.
How can I see if it gets better?
Is it dmesg lines like the following that showed you the interface reset you were talking about?
carp: 30@vlan0.30: INIT -> BACKUP (initialization complete)
Or maybe on opnsense: SYSTEM: LOG FILES: GENERAL should I look for "interface down" ??
Just to make sure, do you use an untagged vlan between both opnsenses? I could see this kind of mixup of master and backup once on trunks that could communicate via their native vlan ID. Once I only used tagged vlans and had the (untagged) vlan parent interfaces unassigned in the opnsenses, this kind of behavior stopped.
Its just a guess additionally to all the other troubleshooting done here so far.
No. Each of my servers have only 2 physical eth ports:
- a WAN port connected to my ISP box.
- a LAN port connected to the main switch. This port is tagged for all existing VLAN.
The pflink connecting the 2 firewall is one of those VLAN (90).
My switch has the following configuration on the LAN port of my firewalls (same for backup and master):
- PVID: 1
- VLAN tag: 10,20,30,40,90 (=all vlans except 1)
On OPNSENSE I did disable LAN interface and created CARP, VIP for all VLAN except 90. And obviously VLAN 1 do not exist in my opnsense conf.
Vlan1 would be your LAN port having an IP.
@mimugmail: you are a magician!
I don't wan't to conclude too early but I can't wait longer to tell you that it's been 48h without any of those entries in my opnsense logs:
2024-02-28T21:31:06 Notice kernel <6>carp: 1@re1: INIT -> BACKUP (initialization complete)
2024-02-28T21:31:04 Notice kernel <6>carp: 30@vlan0.30: INIT -> BACKUP (initialization complete)
I guess that's the "inteface reset" you were talking about.
So if I understand correctly it's a huge stability improvement that may impact positively our subject.
I'll keep you posted.
Pew pew 8)