Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - tofuSCHNITZEL

#1
I realised the issue.
It was the virtualisation all along. since the interface is in a bridge and this bridge is assigned to a virtual nic - the nic never went down when the plug on the hypervisor was physically disconnected. so the master opnsense never had the "interface down" event so it never got demoted and I always ended up with a "split brain" situation where both the primary and secondary became master in the LAN interfaces - and since on the wan both could see each other - the wan stayed master/backup.
I solved it by directly assigning the opnsense VMs the nics on the 10g card via PCI-E passthrough.
#2
Quote from: patient0 on March 19, 2025, 06:10:02 PMNever quite understood why that setting is 'behind' the advanced mode, for half of all the interfaces - all interfaces on one instance - I gotta enter a value in that field.


could you maybe elaborate? why would I need to add a skew? on the secondary the skew is automatically set (+100) via the HA config sync feature...?
#3
Quote from: patient0 on March 19, 2025, 12:22:31 PMI'm on 25.1.3 and it's still available, you gotta enable the 'advanced mode', as before, to see it.
yes you are right! I forgot about the "advanced" switch on top.

Quote from: patient0 on March 19, 2025, 12:22:31 PMFair enough, has to be uniq on the interface/broadcast domain.
Just out of curiosity: You have more than 256 interfaces CARP-ed? And each VLAN has it's own interface in the VM (I assume not)? What type of switch/bridge to you use on Proxmox, Linux bridge or OpenvSwitch?
currently active no - but planned. and of course I need a carp in every vlan otherwise the HA setup does not make much sense.
no the vlans are added in opnsense on the main "interface". the bridge in proxmox that contains this interface is set to be vlan aware (linux bridge) but we had our fair share of headaches with multicast "bleeding" between vlans etc..
#4
yes can confirm, its gone with 25.1.3 (even with advanced on) which is weird, because its still here in the source code?
https://github.com/opnsense/core/blob/master/src/opnsense/mvc/app/models/OPNsense/Interfaces/Vip.xml
#5
Quote from: patient0 on March 19, 2025, 10:23:26 AMThe VHIDs have to be uniq, one VHID per CARP interface. You're setting the same VHID for multiple interfaces.

I have more than 256 vlans so I cannot use different vhids for every vlan. also technically they are "unique" per interface. since the vlans don't "see" each other... i also had a unique VHID for WAN and LAN and then the same for every vlan - but this also did not help with the "together failover"

the advskew is 0 for every VIP on the primary and 100 for every VIP on the secondary
#6
please send pictures of your config - especially the interface assignments and the Carip vip config.
also maybe check if your switch in between is maybe blocking multicast traffic
#7
High availability / Re: short CARP Question
March 19, 2025, 09:20:09 AM
Quote from: c-mu on February 25, 2025, 09:30:02 AMIt happened that one or more VLANs had the status slave, while the rest were still master.

this was the case for me if the VLAN was not created on the switch(es) where the two firewalls are connected to. because these switches drop frames with (to them) unknown vlan tags so the firewalls could not "see" each other (the carp multicast) on these VLANs so both became master
#8
We use OPNsense 25.1.3-amd64 installed on two Proxmox hypervisors. all the interface numberings and assignments are exactly the same on both machines. (the primary/secondary config is generated so the interface numbering must be the same on both machines)
I removed some of the CARP VIPs yesterday because the vlans are not in use currently but I kept the interfaces active on both machines... but this should not matter right?
#9
We face exactly the same problem.
if we unplug the LAN from out primary it does a failover to the secondary on LAN and all the assigned VLAN interfaces. But not on the WAN interface - for the WAN the primary stays master and so internet traffic for the clients is interrupted.
I had Disable preemptive unchecked on the primary and checkend on the secondary... this option disappeared from the gui with the update to 25.1 so I had to remove it from the config from  the secondary manually...
#10
High availability / KeaDHCP HA Warnings
March 13, 2025, 11:43:01 PM
Hi,
I have an opnsense HA setup with KEA in HA to service around 150 interfaces (vlans) with DHCP.
The kea agents communicate over a dedicated (direct) 10G cable the runs between the two hypervisors that host the two opnsense instances. (which is also used for PFSYNC)

since a couple of days we get frequent warnings in the kea log on the primary with the following content:

WARN [kea-dhcp4.ha-hooks.0x1c8631017b00] HA_LEASE_UPDATE_CONFLICT SensePrimary: lease update [hwtype=1 00:1d:c1:0b:91:2a], cid=[01:00:1d:c1:0b:91:2a], tid=0x34523322 sent to SenseSecondary (http://192.168.105.3:8001) returned conflict status code: ResourceBusy: IP address:172.17.90.229 could not be updated. (error code 4)
WARN [kea-dhcp4.lease-cmds-hooks.0x34bf72de2000] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "client-id": "01:00:1d:c1:0b:91:2a", "expire": 1742341137, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "axcdante-0b912a", "hw-address": "00:1d:c1:0b:91:2a", "ip-address": "172.17.90.229", "origin": "ha-partner", "state": 0, "subnet-id": 84, "valid-lft": 1800 }, reason: ResourceBusy: IP address:172.17.90.229 could not be updated.)
on the secondary this is logged:


these warnings appear approx. every 5-10 mins - I have already restarted kea on both machines multiple times. if I call http://192.168.105.3:8001 (this is the
agent listening on the secondary) via curl I get an answer immediately - so I dont know why the "ResourceBusy" error would occur...

Currently I have version OPNsense 25.1.3-amd64 installed.

And now kea even terminated probably of too many resource busy failures..? (see attached screenshot)
I deleted that dhcp4 lease csv file on both nodes and restarted the service and when tailing the files on both nodes I can see new ip leases being added and immediately apperaring on the other node as well so clearly the communictation is working - still getting the ressource busy warnings in the log...

Any ideas?