Kea DHCP duplicating responses on CARP/VLAN interfaces: Race condition

Started by badyusuke, April 29, 2026, 02:53:56 PM

Previous topic - Next topic
April 29, 2026, 02:53:56 PM Last Edit: April 29, 2026, 06:41:57 PM by badyusuke Reason: Fix title
Hello community,

I am experiencing a persistent issue with the Kea DHCP implementation on an OPNsense HA Cluster. When using the default configuration (Raw Sockets + Multi-threading), the Kea service seems to "hear double" on interfaces where CARP is active, leading to duplicated logs and, more critically, conflicting IP offers to the same client.

Environment Setup:

Version: OPNsense 25.7.11_9-amd64.
Setup: Two nodes in HA (Master: Physical / Backup: Virtual).
Networking: High Availability with CARP acting as gateways across multiple VLANs.
DHCP: Kea DHCP configured in HA (Hot-Standby).

The Problem:
When Kea is configured with the default Socket Type: Raw, the service seems to "double-process" incoming broadcast packets on interfaces that have both a physical IP and a CARP VIP.

Because Kea uses multi-threading by default (4 threads), two (or more) different threads capture the same DHCPDISCOVER or DHCPREQUEST at the exact same millisecond. This creates a race condition: each thread checks the lease database, sees an available IP, and sends a separate DHCPOFFER or DHCPACK. In some instances, they even offer different IPs to the same client for the same transaction.

Evidence (Logs):
Notice the different thread IDs (e.g., ...9808 and ...a008) processing the same Transaction ID (tid=0xcb0e85d2) at the exact same millisecond:


2026-04-29T08:46:41-03:00 kea-dhcp4 INFO [kea-dhcp4.packets.0x3fb7b6ac9808] DHCP4_PACKET_RECEIVED [hwtype=1 74:86:e2:xx:xx:43], cid=[01:74:86:e2:xx:xx:43], tid=0xcb0e85d2: DHCPREQUEST received from 0.0.0.0 to 255.255.255.255 on interface vlan031
2026-04-29T08:46:41-03:00 kea-dhcp4 INFO [kea-dhcp4.packets.0x3fb7b6aca008] DHCP4_PACKET_RECEIVED [hwtype=1 74:86:e2:xx:xx:43], cid=[01:74:86:e2:xx:xx:43], tid=0xcb0e85d2: DHCPREQUEST received from 0.0.0.0 to 255.255.255.255 on interface vlan031

2026-04-29T08:46:41-03:00 kea-dhcp4 INFO [kea-dhcp4.leases.0x3fb7b6ac9808] DHCP4_LEASE_OFFER [hwtype=1 74:86:e2:xx:xx:43], tid=0xcb0e85d2: lease 10.x.x.11 will be offered
2026-04-29T08:46:41-03:00 kea-dhcp4 INFO [kea-dhcp4.leases.0x3fb7b6aca008] DHCP4_LEASE_OFFER [hwtype=1 74:86:e2:xx:xx:43], tid=0xcb0e85d2: lease 10.x.x.10 will be offered

Troubleshooting already performed:

Network Mask Check: Validated that there are no subnet overlaps on CARP VIPs.

Socket Type Change: * When switching to Socket Type: UDP, the duplication stops and logs become clean. However, many clients (especially those initiating discovery from 0.0.0.0) fail to receive IPs in UDP mode.

Analysis:

Binding: sockstat -4 -l shows Kea binding to both the Physical IP (e.g., 10.x.x.2:67) and the CARP VIP (e.g., 10.x.x.1:67).

USER     COMMAND    PID   FD  PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
0        kea-dhcp4  47445 18  udp4   10.x.x.2:67           *:*
0        kea-dhcp4  47445 20  udp4   10.x.x.1:67           *:*

It appears that with Raw Sockets, the BPF (Berkeley Packet Filter) delivers the packet to all listening threads. Since both the Physical IP and the CARP VIP are on the same interface, Kea seems to be binding in a way that triggers this double processing.

Questions:

Persistent Thread Limit: Is there a way to limit Kea threads to 1 through the GUI or a tunable?

Deny Service Binding: I noticed the "Deny service binding" option in the CARP settings.

If I enable this for the CARP, would it prevent Kea from listening on the Virtual IP, thus solving the duplication while keeping Raw Sockets?

What are the side effects for other services like ntpd or unbound if they rely on the CARP to serve clients?

Best Practice: Is this a known limitation of Kea on FreeBSD when CARP is involved, and what is the recommended way to use Kea in an HA setup without duplicate processing?

Any insights would be greatly appreciated!

Hello,

I would try in the KEA mailing list first, to see what the recommended configuration is.

If they say its a bug then I would try the KEA gitlab.

Here an example of an issue that went this way: https://github.com/opnsense/ports/issues/267
Hardware:
DEC740

I am following up on my Kea issue in a High Availability (HA/CARP) cluster. After extensive debugging and manual configuration testing, I've identified a fundamental integration issue between Kea's Raw Socket implementation and FreeBSD's BPF behavior in the presence of CARP VIPs.

The Sequential Duplication Proof (Single Thread)

To rule out race conditions between multiple threads, I manually edited kea-dhcp4.conf. I set "thread-pool-size": 1 to force a single-threaded execution. Despite this, the log shows that the exact same thread (0x2bb5ec85c008) processes the same Transaction ID (tid=0x4ca805b) twice in a row, sequentially, within the same millisecond.

Log Evidence:

2026-04-29T11:32:25-03:00 kea-dhcp4 INFO [kea-dhcp4.dhcp4.0x2bb5ec85c008] DHCP4_QUERY_LABEL received query: [hwtype=1 86:eb:0d:xx:xx:af], tid=0x4ca805b
2026-04-29T11:32:25-03:00 kea-dhcp4 INFO [kea-dhcp4.packets.0x2bb5ec85c008] DHCP4_PACKET_SEND [tid=0x4ca805b]: trying to send packet DHCPACK from 10.x.x.2:67 to 10.x.x.32:68 on interface vlan0151

-- (Immediately followed by the exact same process by the SAME thread) --

2026-04-29T11:32:25-03:00 kea-dhcp4 INFO [kea-dhcp4.dhcp4.0x2bb5ec85c008] DHCP4_QUERY_LABEL received query: [hwtype=1 86:eb:0d:xx:xx:af], tid=0x4ca805b
2026-04-29T11:32:25-03:00 kea-dhcp4 INFO [kea-dhcp4.packets.0x2bb5ec85c008] DHCP4_PACKET_SEND [tid=0x4ca805b]: trying to send packet DHCPACK from 10.x.x.2:67 to 10.x.x.32:68 on interface vlan0151

"Deny Service Binding" is ignored by Raw Sockets

In an attempt to stop Kea from "hearing double", I enabled the "Deny service binding" option on the CARP VIP configuration. However, technical analysis via sockstat confirms that this has no effect on Kea when using Raw Sockets.

Sockstat output:

root kea-dhcp4 58683 54 udp4 10.x.x.2:67 *:* # Physical IP Socket
root kea-dhcp4 58683 56 udp4 10.x.x.1:67 *:* # CARP VIP Socket (Should have been denied)

The Definitive Root Cause (Kea Documentation)

I found the explanation in the official Kea 3.0.2 Documentation (Section 9.2.4 - Interface Configuration). It explicitly warns about this exact behavior:

Quote"Caution should be taken when configuring the server to open multiple raw sockets on the interface with several IPv4 addresses assigned. If the directly connected client sends the message to the broadcast address, all sockets on this link will receive this message and multiple responses will be sent to the client."

This confirms that by configuring Kea using only the interface name (e.g., vlan04), OPNsense triggers this "double-hearing" behavior because Kea detects both the physical IP and the CARP VIP. 

The Documented Solution vs. Kea Implementation on OPNSense

The Kea documentation provides a clear fix: 

Quote"To use a single address on such an interface, the 'interface-name/address' notation should be used."

Currently, OPNSense only generates the configuration listing the interface names:
"interfaces": [ "vlan01", "vlan02" ]
To be stable in HA/CARP environments, the service should implement (or allow) the notation:
"interfaces": [ "vlan01/10.x.x.2", "vlan02/10.x.y.2" ] (binding only to the Physical IP).
Feature Requests for OPNsense Developers

Based on the evidence above and the official Kea documentation, I would like to propose the following enhancements to the Kea implementation on OPNSense to better support CARP/HA deployments on FreeBSD:

User-configurable Thread Pool Size:
Exposing the thread-pool-size parameter in the Web GUI would allow administrators to limit Kea to a single thread when necessary. While Kea is highly performant, in environments where BPF duplication occurs, forcing a single thread (size: 1) effectively mitigates simultaneous race conditions where different threads might offer conflicting leases for the same client request.

Implementation of Explicit Interface Binding (interface/address):
According to Kea documentation (Section 9.2.4), the current method of binding to the interface name alone causes every open Raw/BPF socket to receive a copy of broadcast packets. I suggest updating the plugin logic to allow (or automatically implement) the "interface-name/address" notation (e.g., "vlan01/10.x.x.2"). Binding specifically to the Physical IP of the node, while still defining the CARP VIP as the gateway in the subnet options, appears to be the only documented way to prevent the sequential double-processing of packets in a CARP setup while maintaining the mandatory Raw Socket mode for FreeBSD.

These adjustments would make the Kea integration significantly more robust for High Availability clusters, preventing log exhaustion and potential IP conflicts.

No binding to specific IP addresses would degrade the stability in a different way.

IP address not existing on startup on any interface prevents KEA from starting completely.

Binding to just interfaces (without specifying IPs) is just a warning if one interface is not up and KEA starts anyway, and it even retries to rebind a few times.
https://github.com/opnsense/core/issues/10072

Here a different issue that also requires per address binding, and where I prove KEA fails completely when IP address misses. Now combine that with CARP... and all of the startup race conditions that include. DHCP is such a vital service this should never fail on start.
https://github.com/opnsense/core/issues/10226
Hardware:
DEC740

April 29, 2026, 07:55:57 PM #4 Last Edit: April 29, 2026, 09:16:22 PM by badyusuke Reason: Informations to avoid Kea fails on bind IP address/sockets
Thank you for the detailed explanation and for the GitHub references. I completely understand the design priority: startup resilience is paramount, and a service that fails to start because an IP is missing is indeed a major stability risk.

It seems we are looking at a side effect of that resilience in HA/CARP environments. While binding to the interface name ensures Kea always starts, the Kea 3.0.2 documentation (Section 9.2.4) explains why this leads to the duplication I'm seeing:

Quote'Caution should be taken when configuring the server to open multiple raw sockets on the interface with several IPv4 addresses assigned... all sockets on this link will receive this message and multiple responses will be sent to the client. ... the configuration with multiple IPv4 addresses assigned should not be used when the directly connected clients are operating on that link.'

This matches my findings: even with thread-pool-size: 1, Kea processes the same request twice because it has two active BPF sockets (one for the Physical IP and one for the CARP). This leads to lease inconsistencies (potential IP conflicts) and log exhaustion.

To help improve the Kea implementation for complex HA setups without risking current startup stability, could we perhaps consider a 'middle ground' solution for Kea on OPNsense? Would it be possible to add an optional checkbox in the GUI for 'Strict HA Binding'? This would allow advanced users to manually enable the 'interface/address' notation. This way, the default behavior remains 'safe' for everyone, but administrators of HA clusters could opt-in to the binding method recommended by the Kea manual to prevent sequential duplication.

In my view, offering an 'opt-in' method for address-specific binding would greatly enhance Kea's reliability for production-grade HA environments while respecting OPNsense's startup stability goals. I'm happy to perform more tests if needed!

Update Note:

Section 9.2.4 of the Kea manual provides built-in parameters to handle startup resilience:

QuoteThe service-sockets-require-all option makes Kea require all sockets to be successfully bound. If any opening fails, Kea interrupts the initialization and exits with a non-zero status. (Default is false).

"The port can be unavailable only temporary. In this case, retrying the opening may resolve the problem. Kea provides two options to specify the retrying: service-sockets-max-retries and service-sockets-retry-wait-time."

By setting service-sockets-max-retries to a non-zero value and keeping service-sockets-require-all as false, Kea will retry opening missing sockets for a defined period and will not fail to start even if some sockets remain unopened.

QuoteThe service-sockets-require-all option... (Default is false).

It already is false in out default setup, if an IP address is unavailable it still fails. It will only not fail if an interface is down, that is retried.

I wonder how big the problem you describe actually is because you have the first report. There are lots of big HA setups out there that run this just fine it seems even if there is the potential for a race condition in lease assignment.

If you want to discuss it please open a github ticket, and if you are AI assisted please disclose it.

EDIT:

You can also use only a CARP IP on an interface as sole IP without attaching more IP addresses to it, then you wont have multi IP interfaces either. That solution is quite common eg for WAN interfaces that only have one free IP available in HA that both firewalls should share.
Hardware:
DEC740

Quote from: Monviech (Cedrik) on April 29, 2026, 06:54:47 PMIP address not existing on startup on any interface prevents KEA from starting completely.
Linux has a SysCtl tuneable to allow Services to bind to "not yet existing IP Addresses" : Does FreeBSD have something similar ?
Weird guy who likes everything Linux and *BSD on PC/Laptop/Tablet/Mobile and funny little ARM based boards :)