Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - tt-ah

#1
Sorry for the late response, I did not get an e-mail notification.

The backup node has a default route which makes the upstream ntp-servers available.
The only issue is, that the ntp-service uses the CARP_VIP of the WAN-Interface when trying to reach those ntp-servers.

I am not sure, if I can setup the master-node as NTP-Server for the backup-node as a workaround, because the ntp-service is replicated using Config-Sync. This means the master-node would also use itself as ntp-server. Could this cause problems during normal operation?

I have another 2-node CARP-Setup (which is our internal ntp-server) where his exact setup works as intended. Here the backup node uses its own IP von the WAN-Interface, instead of the CARP_VIP to communicate with the upstream servers.
#2
Hi everyone!

I have a HA-Cluster with 2 nodes running on 23.1.11.

I am trying to upgrade but I see 2 issues causing me to hesitate:

I noticed time being out of sync on the backup node and network time>Status showing Unreach/Pending for all configured NTP servers.
While troubleshooting this I noticed that the network time service on the backup node uses the VIP of the WAN-interface to send its requests to the ntp servers instead of its own address on WAN. This struck me as stupid, since the replies will go to the master node.
No other service on the backup behaves like this to my knowledge. DNS and other outgoing connections like backup to gitlab/Nextcloud work fine.

If I SSH into the host and use ntpdate it will use its own IP and thus be able to sync time. When I stop the network time service and use ntpdate *ntp-server* I can successfully set the current time.

CARP is running on WAN and on several VLANs on LAN. Additionally there is an alias-IP on WAN used to SNAT some local subnets. I have state-sync and config-sync for network time (and more) configured. Config sync is disabled for static routes, but none are configured.

I can not tell exactly when this started, I just know with certainty that it has not been an issue up until I started upgrading the cluster step-by-step from 21.x onwards.

So far I have tried

  • restarting the network time service
  • rebooting the node
  • outbound SNAT rule to SNAT traffic to ntp-server to its own IP
  • searching for similar issues

Can someone tell me what I could look for? It feels like a bug to me but I see no reports of this behaviour from anyone else.
#3
I know this is an old topic, but it is my own and still unreplied.

I write here again, because this issue just happened again after I triggered a configuration sync on the main-firewall.  Again the backup firewall took over MASTER-status on WAN while the main-firewall also stayed MASTER.

main-firewall is 23.1.6
backup-firewall is on 23.1.11
(i am currently trying to patch to 23.7 but this issue is interrupting me)

During and shortly after the config-sync I see several events such as the following on the main-firewall. op1(igb3) is the SYNC interface between main- and backup-firewall and is a direct connection

2023-11-21T21:13:48 Notice opnsense /usr/local/etc/rc.linkup: ROUTING: entering configure using 'opt1'
2023-11-21T21:13:48 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for opt1(igb3)
2023-11-21T21:13:43 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt1(igb3)


On the backup-firewall I see the following:

2023-11-21T20:33:14 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface VIP_WAN (xx.yy.xx.yy).
2023-11-21T20:33:14 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "VIP_WAN (xx.yy.xx.yy) (2@igb1)" has resumed the state "MASTER" for vhid 2
2023-11-21T20:33:13 Notice kernel <6>carp: 2@igb1: BACKUP -> MASTER (master timed out)
2023-11-21T20:33:13 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface VIP_WAN (xx.yy.xx.yy).
2023-11-21T20:33:13 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "VIP_WAN (xx.yy.xx.yy) (2@igb1)" has resumed the state "BACKUP" for vhid 2
2023-11-21T20:33:12 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface VIP_WAN (xx.yy.xx.yy).
2023-11-21T20:33:12 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "VIP_WAN (xx.yy.xx.yy) (2@igb1)" has resumed the state "INIT" for vhid 2
2023-11-21T20:33:12 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface VIP_WAN (xx.yy.xx.yy).
2023-11-21T20:33:12 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "VIP_WAN (xx.yy.xx.yy) (2@igb1)" has resumed the state "BACKUP" for vhid 2
2023-11-21T20:33:11 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface VIP_WAN (xx.yy.xx.yy).
2023-11-21T20:33:11 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "VIP_WAN (xx.yy.xx.yy) (2@igb1)" has resumed the state "INIT" for vhid 2
2023-11-21T20:33:11 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface VIP_WAN (xx.yy.xx.yy).
2023-11-21T20:33:11 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "VIP_WAN (xx.yy.xx.yy) (2@igb1)" has resumed the state "BACKUP" for vhid 2
2023-11-21T20:33:10 Notice configctl event @ 1700595190.27 exec: system event config_changed
2023-11-21T20:33:10 Notice configctl event @ 1700595190.27 msg: Nov 21 20:33:10 sws-tue-gw2.domain.de config[28268]: config-event: new_config /conf/backup/config-1700595190.2471.xml
2023-11-21T20:33:10 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface VIP_WAN (xx.yy.xx.yy).
2023-11-21T20:33:10 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "VIP_WAN (xx.yy.xx.yy) (2@igb1)" has resumed the state "INIT" for vhid 2
2023-11-21T20:33:10 Notice configctl event @ 1700595190.27 exec: system event config_changed
2023-11-21T20:33:10 Notice configctl event @ 1700595190.27 msg: Nov 21 20:33:10 sws-tue-gw2.domain.de config[28268]: config-event: new_config /conf/backup/config-1700595190.2471.xml
2023-11-21T20:33:10 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface VIP_WAN (xx.yy.xx.yy).
2023-11-21T20:33:10 Notice opnsense /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "VIP_WAN (xx.yy.xx.yy) (2@igb1)" has resumed the state "INIT" for vhid 2
2023-11-21T20:33:10 Notice opnsense /xmlrpc.php: plugins_configure monitor (execute task : dpinger_configure_do())
2023-11-21T20:33:10 Notice opnsense /xmlrpc.php: plugins_configure monitor ()
2023-11-21T20:33:10 Notice opnsense /xmlrpc.php: ROUTING: keeping current inet default gateway 'xx.yy.xx.zz'
2023-11-21T20:33:10 Notice opnsense /xmlrpc.php: ROUTING: configuring inet default gateway on wan
2023-11-21T20:33:10 Notice opnsense /xmlrpc.php: ROUTING: entering configure using defaults
2023-11-21T20:33:10 Notice kernel <6>carp: 2@igb1: INIT -> BACKUP (initialization complete)
2023-11-21T20:33:10 Notice kernel <6>carp: 2@igb1: BACKUP -> INIT (hardware interface up)
2023-11-21T20:33:10 Notice kernel <6>carp: 2@igb1: INIT -> BACKUP (initialization complete)
2023-11-21T20:33:10 Notice kernel <6>carp: 2@igb1: BACKUP -> INIT (hardware interface up)
2023-11-21T20:33:10 Notice kernel <6>carp: 2@igb1: INIT -> BACKUP (initialization complete)
2023-11-21T20:33:10 Notice kernel <6>carp: 2@igb1: BACKUP -> INIT (hardware interface up)


I fixed it by doing a reboot of the backup-firewall.

I have the following questions:

  • What is wrong with my configuration that causes the (a) the splitbrain and (b) WAN being the only affected interface. I want the MASTER-state to be always concentrated on a single node.
  • What causes this failover in the first place when I trigger a configuration sync? Before the updates from 22.7 onward, this has not been an issue and my configuration has not changed since.
#4
Hi everyone!

I saw some weird behaviour of a HA-cluster last week on which I would like to consult you guys.

Environment:
Its a 2 node cluster. The main-firewall (IP: .11) ran on 22.7 and the backup-firewall (IP: .12) on 23.1.
In total there are 8 CARP-Instances: 1 on WAN (VIP: .10), 7 on VLANs on LAN.
The VIP of WAN is used for a site-2-site IPsec.
In addition there is an alias (IP: .14) on WAN.
WAN and LAN are connected to a switch stack. The sync-interface is directly connected between the two firewalls.
The ISP gateway (IP: .9) is connected to the same switch and is being monitored by both firewalls as default gateway.
During this saga nothing in terms of link-status events was logged on the switch.

Config:
net.inet.carp.preempt: 1
otherwise see screenshots

What happened:
At 17:27:45 (see attached log extract) the backup firewall started swapping between INIT and BACKUP on WAN. After a few seconds it became MASTER. But it only took MASTER role on WAN, all other interfaces stayed BACKUP. Meanwhile the main-firewall still had MASTER status on all interfaces. This stayed this way until I rebooted the backup-firewall. Then everything was back to normal, backup-node had BACKUP state on all interfaces. dmesg on backup-firewall was clean concerning the NICs.
The IPsec tunnel was not affected at all and seemed to be held by main-firewall.

This happened while the ISP or their carrier had some IP-hash based routing issues between 16:57 and 18:32.
This definetely affected the cluster in terms of external availability, but internally all should have been fine. There are no logs indicating loss to the default gateway. (IPs of the cluster were available while others were not. Depending on external source-ip the available/unavailable IPs varied).


Attached are the logs of backup-firewall during the timeframe when it occured. During the state-switching there are logs concerning OpenVPN on backup-firewall even though OpenVPN is not configured on these systems (but installed). It logs the alias-ip (.14) disappearing on WAN - could this cause CARP to "freak-out" in this way? I dont know why it would disappear - it is still configured as it was on both systems.

On the main-firewall only this was logged (apart from filterlogs):


May 17 17:27:14 main-firewall configd.py[244]: [6d22285b-e647-45f5-a0a2-b7f33c75f747] system status
...
May 17 17:27:19 main-firewall configd.py[244]: [3636d608-c56b-4e4b-97cd-d4a1f49f07b8] request pfsync info
...
May 17 17:27:20 main-firewall configd.py[244]: [b7530887-f020-441a-9544-2847c8758fe9] system status
...
May 17 17:27:39 main-firewall configd.py[244]: [50cf99f7-c26e-46ed-a62c-a28676368395] system status
...
May 17 17:27:43 main-firewall configd.py[244]: [a86b37a2-e610-4420-9e69-86cc469980da] Syncing firewall
...
May 17 17:27:47 main-firewall opnsense[18207]: /usr/local/etc/rc.filter_synchronize: Filter sync successfully completed with https://192.168.254.2/xmlrpc.php.
...
May 17 17:27:50 sws-tue-gw1 devd[506]: Processing event '!system=DEVFS subsystem=CDEV type=DESTROY cdev=pts/0'



I was doing updates during the last few weeks from 20.x to 23.1 - always putting the backup-firewall one step ahead of the main-firewall. This issue occurred about 2 hrs after the last update.

Right now I can not explain what might have caused the issues with CARP here.
It happened right after a filter sync, but the logs dont show any dropped CARP advertisements. Also in my understanding during the failover all interfaces should be transfered instead of only one. Obviously the main-firewall should not have stayed MASTER given a valid cause for the failover itself. I dont see how the ISP/upstream issues could have caused this, even though it happened during this time.

Do you have an idea what might have caused this issue? I would be very grateful.

If I missed something you need in terms of configuration or description please let me know.


#5
Hi everyone!

Just caused a couple unintended failovers on our CARP-Cluster on a client site. Has been running rock solid for a good 500 days. CARP worked fine so the client did not notice the failover at all. Nevertheless I would like to clear up this issue, since I have to do further configurations.

Setup is as follows:

  • 2x OPNsense 20.7.8_4-amd64 (I know)
  • 4 NICS
  • 3 VLANs on LAN (igb0)
  • CARP on all VLANs and WAN-Port (igb1)

Scenario:
Client site gets a new wifi setup which requires 4 more VLANs on the LAN interface (igb0). After adding every VLAN I got "logged out" from the webinterface, had to relog and got Cookie related errors. On the fourth occurrence I noticed, that the cluster was actually failing over every time I added a VLAN since I relogged to the secondary node.

Upon checking the logs, I saw that the Interface goes down for about 5 seconds after adding a VLAN to it.

I am not used to anything of the sort on other network hardware when making such changes. I did not find any known issue/bugs/reports of this effect.

I am now hesitant to configure the cluster further (VLANs on Backup-Node, IPs on those new VLANs, CARP on those new VLANs, DHCP on those new VLANs).

Logs (after adding VLAN 903 on LAN (igb0) ):
Jun 30 08:53:08 sws-tue-gw1 devd[98176]: Processing event '!system=IFNET subsystem=vlan4 type=ATTACH'
Jun 30 08:53:08 sws-tue-gw1 devd[98176]: Processing event '!system=ETHERNET subsystem=vlan4 type=IFATTACH'
Jun 30 08:53:08 sws-tue-gw1 devd[98176]: Executing '/usr/libexec/hyperv/hyperv_vfattach $'vlan4' 0'
Jun 30 08:53:08 sws-tue-gw1 dhcpd[62818]: failover peer dhcp_opt2: network down
Jun 30 08:53:08 sws-tue-gw1 devd[98176]: Processing event '!system=IFNET subsystem=igb0 type=LINK_DOWN'
Jun 30 08:53:08 sws-tue-gw1 charon[37623]: 15[KNL] interface igb0_vlan903 appeared
Jun 30 08:53:08 sws-tue-gw1 charon[37623]: 15[KNL] xxx.xxx.xxx.xxx disappeared from igb1
Jun 30 08:53:08 sws-tue-gw1 devd[98176]: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop $'igb0''
Jun 30 08:53:08 sws-tue-gw1 charon[37623]: 15[KNL] interface igb0_vlan903 activated
Jun 30 08:53:08 sws-tue-gw1 configd.py[96964]: [14c7ff7b-1237-4d5f-ac39-021f62c205ba] Linkup stopping igb0
Jun 30 08:53:09 sws-tue-gw1 opnsense[36836]: /usr/local/etc/rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (0.0.0.0 ::)
Jun 30 08:53:09 sws-tue-gw1 devd[98176]: Processing event '!system=CARP subsystem=4@igb0_vlan990 type=INIT'
Jun 30 08:53:09 sws-tue-gw1 devd[98176]: Processing event '!system=IFNET subsystem=igb0_vlan990 type=LINK_DOWN'
Jun 30 08:53:09 sws-tue-gw1 devd[98176]: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop $'igb0_vlan990''
Jun 30 08:53:09 sws-tue-gw1 configd.py[96964]: [09c46065-3f53-4592-a74f-aa78cc4eac34] Linkup stopping igb0_vlan990
Jun 30 08:53:09 sws-tue-gw1 configctl[28232]: event @ 1656571988.80 msg: Jun 30 08:53:08 sws-tue-gw1.xxxxxxxxxx.xx config[53108]: config-event: new_config /conf/backup/config-1656571988.7957.xml
Jun 30 08:53:09 sws-tue-gw1 configctl[28232]: event @ 1656571988.80 exec: system event config_changed
Jun 30 08:53:09 sws-tue-gw1 configd.py[96964]: [5a25d4db-8863-4201-89a2-14bf1a5d8614] trigger config changed event
Jun 30 08:53:09 sws-tue-gw1 opnsense[17071]: /usr/local/etc/rc.linkup: Hotplug event detected for LAN_990_mgmt(opt6) but ignoring since interface is configured with static IP (10.199.90.252 ::)
Jun 30 08:53:09 sws-tue-gw1 devd[98176]: Processing event '!system=IFNET subsystem=igb0_vlan903 type=LINK_DOWN'
Jun 30 08:53:09 sws-tue-gw1 devd[98176]: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop $'igb0_vlan903''
Jun 30 08:53:09 sws-tue-gw1 configd.py[96964]: [d15cb733-9914-4d71-8b08-9ddc3b7d4ae7] Linkup stopping igb0_vlan903
Jun 30 08:53:09 sws-tue-gw1 devd[98176]: Processing event '!system=CARP subsystem=3@igb0_vlan902 type=INIT'
Jun 30 08:53:09 sws-tue-gw1 devd[98176]: Processing event '!system=IFNET subsystem=igb0_vlan902 type=LINK_DOWN'
Jun 30 08:53:09 sws-tue-gw1 devd[98176]: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop $'igb0_vlan902''
Jun 30 08:53:09 sws-tue-gw1 configd.py[96964]: [d081e87b-e10b-4f6b-9ae3-2680b0b43e4d] Linkup stopping igb0_vlan902
Jun 30 08:53:09 sws-tue-gw1 filterlog[96237]: 81,,,0,igb0_vlan903,match,pass,out,6,0x00,0x00000,1,ip,0,36,::,ff02::16,HBH,PADN,RTALERT,0x0000,
Jun 30 08:53:09 sws-tue-gw1 filterlog[96237]: 81,,,0,igb0_vlan903,match,pass,out,6,0x00,0x00000,255,ipv6-icmp,58,32,::,ff02::1:ff00:346d,
Jun 30 08:53:09 sws-tue-gw1 opnsense[43024]: /usr/local/etc/rc.linkup: Hotplug event detected for LAN_902_printers(opt4) but ignoring since interface is configured with static IP (10.200.28.252 ::)
Jun 30 08:53:10 sws-tue-gw1 devd[98176]: Processing event '!system=CARP subsystem=1@igb0_vlan901 type=INIT'
Jun 30 08:53:10 sws-tue-gw1 devd[98176]: Processing event '!system=IFNET subsystem=igb0_vlan901 type=LINK_DOWN'
Jun 30 08:53:10 sws-tue-gw1 devd[98176]: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop $'igb0_vlan901''
Jun 30 08:53:10 sws-tue-gw1 configd.py[96964]: [76e761f3-41d9-4be1-afff-7ad5e66f82d3] Linkup stopping igb0_vlan901
Jun 30 08:53:10 sws-tue-gw1 opnsense[52486]: /usr/local/etc/rc.linkup: Hotplug event detected for LAN_901_intra(opt2) but ignoring since interface is configured with static IP (10.200.32.252 ::)
Jun 30 08:53:10 sws-tue-gw1 devd[98176]: Processing event '!system=CARP subsystem=2@igb1 type=BACKUP'
Jun 30 08:53:10 sws-tue-gw1 devd[98176]: Executing '/usr/local/opnsense/service/configd_ctl.py interface carp $'2@igb1' $'BACKUP''
Jun 30 08:53:10 sws-tue-gw1 configd.py[96964]: [ce3a5ff4-b1a0-423e-b5c3-6e28c29d5021] Carp event on subsystem 2@igb1 for type BACKUP
Jun 30 08:53:10 sws-tue-gw1 opnsense[50023]: /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member "xxx.xxx.xxx.xxx - VIP_WAN (2@igb1)" has resumed the state "BACKUP" for vhid 2
Jun 30 08:53:13 sws-tue-gw1 devd[98176]: Processing event '!system=IFNET subsystem=igb0 type=LINK_UP'
Jun 30 08:53:13 sws-tue-gw1 devd[98176]: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup start $'igb0''
Jun 30 08:53:13 sws-tue-gw1 configd.py[96964]: [3dfd282b-bf24-4d4d-80d0-46a82d86a05f] Linkup starting igb0
Jun 30 08:53:13 sws-tue-gw1 opnsense[5859]: /usr/local/etc/rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (0.0.0.0 ::)
Jun 30 08:53:13 sws-tue-gw1 configd.py[96964]: [2c62e4a1-2680-43be-8746-1f9d64434060] New IPv4 on igb0
Jun 30 08:53:13 sws-tue-gw1 opnsense[51169]: /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'igb0'
Jun 30 08:53:13 sws-tue-gw1 opnsense[51169]: /usr/local/etc/rc.newwanip: On (IP address: ) (interface: LAN[lan]) (real interface: igb0).
Jun 30 08:53:13 sws-tue-gw1 opnsense[51169]: /usr/local/etc/rc.newwanip: Failed to detect IP for LAN[lan]
Jun 30 08:53:13 sws-tue-gw1 devd[98176]: Processing event '!system=CARP subsystem=4@igb0_vlan990 type=BACKUP'
Jun 30 08:53:13 sws-tue-gw1 devd[98176]: Executing '/usr/local/opnsense/service/configd_ctl.py interface carp $'4@igb0_vlan990' $'BACKUP''


#6
Hatte es auf beiden Systemen. Aber auch wenn es nur auf dem Slave eingerichtet ist, der Master wird ja trotz inaktivem Interface wieder auf 0 promoted:

Jun 28 14:56:00   kernel: carp: demoted by -240 to 0 (vhid removed)
Jun 28 14:55:59   kernel: carp: 1@igb0: MASTER -> BACKUP (more frequent advertisement received)
Jun 28 14:55:59   kernel: carp: demoted by 240 to 240 (interface down)

Dann wird der Slave das ja auch nicht übernehmen.
#7
Patch habe ich getestet, das Problem mit dem CARP-Interface tritt nicht mehr auf.  8)

Jedoch ist es nun wieder so, dass beim Ausfall eines Interfaces nur das eine Interface an den Slave abgegeben wird. Das zweite Interface bleibt beim Master :/

Das hatte vor dem Patch noch funktioniert. Das Tunable net.inet.carp.preempt "1" ist gesetzt...

Hat da jemand eine Idee?


Jun 28 14:58:05   kernel: carp: 1@igb0: BACKUP -> MASTER (preempting a slower master)
Jun 28 14:58:04   opnsense: /usr/local/etc/rc.carpmaster: Starting OpenVPN server instance on xxx.xxx.xxx.xxx - WAN_CARP because of transition to CARP master.
Jun 28 14:58:04   opnsense: /usr/local/etc/rc.carpmaster: Carp cluster member "xxx.xxx.xxx.xxx - WAN_CARP (2@igb1)" has resumed the state "MASTER" for vhid 2
Jun 28 14:58:04   kernel: carp: 2@igb1: BACKUP -> MASTER (preempting a slower master)
Jun 28 14:58:04   kernel: carp: demoted by -240 to 0 (pfsync bulk fail)
Jun 28 14:57:04   opnsense: /usr/local/etc/rc.carpbackup: Carp cluster member "192.168.123.1 - LAN_CARP (1@igb0)" has resumed the state "BACKUP" for vhid 1
Jun 28 14:57:04   opnsense: /usr/local/etc/rc.carpbackup: Carp cluster member "xxx.xxx.xxx.xxx - WAN_CARP (2@igb1)" has resumed the state "BACKUP" for vhid 2
Jun 28 14:56:58   kernel: carp: 1@igb0: MASTER -> BACKUP (more frequent advertisement received)
Jun 28 14:56:58   kernel: carp: demoted by 240 to 240 (pfsync bulk start)
Jun 28 14:56:58   kernel: carp: 2@igb1: INIT -> BACKUP (initialization complete)
Jun 28 14:56:01   opnsense: /usr/local/etc/rc.carpmaster: Carp cluster member "192.168.123.1 - LAN_CARP (1@igb0)" has resumed the state "MASTER" for vhid 1
Jun 28 14:56:00   kernel: carp: 1@igb0: BACKUP -> MASTER (preempting a slower master)
Jun 28 14:56:00   opnsense: /usr/local/etc/rc.carpbackup: Carp cluster member "192.168.123.1 - LAN_CARP (1@igb0)" has resumed the state "BACKUP" for vhid 1
Jun 28 14:56:00   kernel: carp: demoted by -240 to 0 (vhid removed)
Jun 28 14:55:59   kernel: carp: 1@igb0: MASTER -> BACKUP (more frequent advertisement received)
Jun 28 14:55:59   kernel: carp: demoted by 240 to 240 (interface down)
Jun 28 14:55:59   kernel: carp: 2@igb1: MASTER -> INIT (hardware interface down)
#8
Stimmt, da war ja was  :-[

Cool, dass ihr es fixen konntet. Freue mich aufs Update  :)
#9
Danke für die Info clystron!

Den Bug #6892 hatte ich auch gefunden, aber abgehakt, da er als resolved markiert war. Das neue Topic im Github ist aber sehr interessant! Also dann doch warten bis 18.7  ::)
#10
Abweichung bei mir wäre, dass der Promiscious Mode tatsächlich deaktiviert wird und nicht nach UP des Interfaces aktiviert wird.

(s. Log in früherem Post)
#11
Okay, danke euch beiden für das Update.

Die Treiberupdates sehen für mich aber auch nach einem Schuss ins Dunkle aus  :(
Ich vermute im Moment immer noch irgendeine Fehlkonfiguration, da die Systeme, welche wir verwenden, weit verbreitet in Gebrauch sind und ich von dieser Problematik jedoch bisher recht wenig gelesen habe...

Ich halte mal die Ohren steif und poste hier, sollte ich etwas finden können.
#12
Hallo zusammen.

Ruffy, hast du zu diesem Problem eine Lösung gefunden, oder bist du dabei geblieben die neue Hardware zu verwenden?

Ich verwende zwei APU2c4-Boards mit OPNsense 18.1.9 auf denen auf LAN und WAN-Interface CARP läuft. Bei einem Poweroff/Stromkabelziehen läuft alles wunderbar, der Failover ist schnell und läuft korrekt ab. Auch das Zurückswitchen läuft ohne Probleme.

Wenn ich jedoch ein Ethernet-Kabel ziehe oder den Switchport herunterfahre und anschließend wieder verbinde funktioniert zwar der Failover, aber danach zeigt sich bei mir das gleiche Verhalten. Auf dem Interface, welches den Link verloren hat wird kein CARP-Status mehr angezeigt und der zweite Host übernimmt hier den Master. Der Master des zweiten Interface bleibt beim ersten Host. Und das obwohl inet.carp.preempt=1 gesetzt ist. Zusätzlich habe ich beim Debugging derzeit noch net.inet.carp.senderr_demotion_factor=0 gesetzt, was jedoch auch nicht geholfen hat. Auch State-Sync auf dem Backup-Node hilft nicht. 18.1.4 war meine erste Version, damit war das Problem auch schon vorhanden. Das Problem wird anschließend nur durch einen Reboot des betroffenen Gates, oder durch temporäres Aus&An von CARP behoben.

Anbei ein Auszug aus dem Log. Mich verwundert warum der Promiscious Mode auf dem betroffenen Interface deaktiviert wird. Auch verstehe ich nicht, warum ich trotz eines nicht funktionierenden Interface (da dort kein CARP läuft) wieder promoted werde auf 0.
Von unten nach oben zu lesen:

#carp inactive on LAN port, carp master on WAN port (although net.inet.carp.preempt=1 is set)
Jun 12 11:24:35    opnsense: /usr/local/etc/rc.newwanip: Interface 'opt3' is disabled or empty, nothing to do.
Jun 12 11:24:35    opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpns1'
Jun 12 11:24:34    kernel: ovpns1: link state changed to UP
Jun 12 11:24:29    kernel: ovpns1: link state changed to DOWN
Jun 12 11:24:29    opnsense: /usr/local/etc/rc.carpmaster: Starting OpenVPN server instance on xxx.xxx.xxx.xxx - WAN_CARP because of transition to CARP master.
Jun 12 11:24:29    opnsense: /usr/local/etc/rc.carpmaster: Carp cluster member "xxx.xxx.xxx.xxx - WAN_CARP (2@igb1)" has resumed the state "MASTER" for vhid 2


Jun 12 11:24:28    kernel: carp: 2@igb1: BACKUP -> MASTER (preempting a slower master)
Jun 12 11:24:28    kernel: carp: demoted by -240 to 0 (pfsync bulk fail)


Jun 12 11:23:45    opnsense: /usr/local/etc/rc.carpbackup: Carp cluster member "xxx.xxx.xxx.xxx - WAN_CARP (2@igb1)" has resumed the state "BACKUP" for vhid 2
Jun 12 11:23:38    opnsense: /usr/local/etc/rc.newwanip: Interface 'opt3' is disabled or empty, nothing to do.
Jun 12 11:23:38    opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpns1'
Jun 12 11:23:37    kernel: ovpns1: link state changed to UP
Jun 12 11:23:32    kernel: ovpns1: link state changed to DOWN
Jun 12 11:23:32    opnsense: /usr/local/etc/rc.carpmaster: Starting OpenVPN server instance on xxx.xxx.xxx.xxx - WAN_CARP because of transition to CARP master.

Jun 12 11:23:32    opnsense: /usr/local/etc/rc.carpmaster: Carp cluster member "xxx.xxx.xxx.xxx - WAN_CARP (2@igb1)" has resumed the state "MASTER" for vhid 2
Jun 12 11:23:31    opnsense: /usr/local/etc/rc.carpmaster: Carp cluster member "192.168.123.1 - LAN_CARP (1@igb0)" has resumed the state "MASTER" for vhid 1

Jun 12 11:23:28    opnsense: /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface LAN.
Jun 12 11:23:24    opnsense: /usr/local/etc/rc.newwanip: ROUTING: skipping IPv6 default route
Jun 12 11:23:24    opnsense: /usr/local/etc/rc.newwanip: ROUTING: skipping IPv4 default route
Jun 12 11:23:24    opnsense: /usr/local/etc/rc.newwanip: ROUTING: no IPv6 default gateway set, assuming wan
Jun 12 11:23:24    opnsense: /usr/local/etc/rc.newwanip: ROUTING: IPv4 default gateway set to wan
Jun 12 11:23:24    opnsense: /usr/local/etc/rc.newwanip: ROUTING: entering configure using 'lan'
Jun 12 11:23:22    kernel: ifa_maintain_loopback_route: deletion failed for interface igb1: 3

Jun 12 11:23:22    kernel: carp: 2@igb1: MASTER -> BACKUP (more frequent advertisement received)
Jun 12 11:23:22    kernel: carp: demoted by 240 to 240 (pfsync bulk start)


Jun 12 11:23:22    kernel: igb0: promiscuous mode disabled
Jun 12 11:23:22    kernel: carp: 1@igb0: MASTER -> INIT (hardware interface up)
Jun 12 11:23:22    kernel: ifa_maintain_loopback_route: deletion failed for interface igb0: 3
Jun 12 11:23:22    kernel: ifa_maintain_loopback_route: deletion failed for interface igb0: 3
Jun 12 11:23:22    opnsense: /usr/local/etc/rc.newwanip: The command '/sbin/ifconfig igb0 '192.168.123.1'/'24' alias vhid '1'' returned exit code '1', the output was 'ifconfig: ioctl (SIOCAIFADDR): Protocol not available'

Jun 12 11:23:22    opnsense: /usr/local/etc/rc.newwanip: On (IP address: 192.168.123.2) (interface: LAN[lan]) (real interface: igb0).
Jun 12 11:23:22    opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'igb0'
Jun 12 11:23:22    kernel: carp: 2@igb1: BACKUP -> MASTER (preempting a slower master)
Jun 12 11:23:21    kernel: ifa_maintain_loopback_route: insertion failed for interface igb0: 17
Jun 12 11:23:21    kernel: carp: 1@igb0: BACKUP -> MASTER (preempting a slower master)
Jun 12 11:23:21    opnsense: /usr/local/etc/rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (192.168.123.2 ::)
Jun 12 11:23:21    opnsense: /usr/local/etc/rc.carpbackup: Carp cluster member "192.168.123.1 - LAN_CARP (1@igb0)" has resumed the state "BACKUP" for vhid 1

Jun 12 11:23:20    kernel: igb0: link state changed to UP
Jun 12 11:23:20    kernel: carp: demoted by -240 to 0 (interface up)
Jun 12 11:23:20    kernel: carp: 1@igb0: INIT -> BACKUP (initialization complete)
#enable port on switch#


Jun 12 11:22:50    opnsense: /usr/local/etc/rc.carpbackup: Carp cluster member "xxx.xxx.xxx.xxx - WAN_CARP (2@igb1)" has resumed the state "BACKUP" for vhid 2
Jun 12 11:22:50    opnsense: /usr/local/etc/rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (192.168.123.2 ::)
Jun 12 11:22:49    kernel: ifa_maintain_loopback_route: deletion failed for interface igb1: 3
Jun 12 11:22:49    kernel: carp: 2@igb1: MASTER -> BACKUP (more frequent advertisement received)
Jun 12 11:22:49    kernel: igb0: link state changed to DOWN
Jun 12 11:22:49    kernel: carp: demoted by 240 to 240 (interface down)
Jun 12 11:22:49    kernel: carp: 1@igb0: MASTER -> INIT (hardware interface down)   
#disable Port on Switch#




###Edit: OPNsense Version hinzugefügt###
###Edit2: Temporäre Behebung hinzugefügt###