DHCPD is stuck in recover state

Started by Vladiss, January 28, 2021, 02:31:34 PM

Previous topic - Next topic
Hello people,

I have CARP enabled in my lab setup. It works just fine except for DHCP server on LAN - it just won't start. OPNsense version is 20.7.8. The nodes are configured as follows:

Primary: 192.168.1.10/24
Secondary: 192.168.1.20/24
Virtual IP: 192.168.1.1/24
DHCP scope: 192.168.1.100-199

Primary dhcpd.conf :

option domain-name "localdomain";
option ldap-server code 95 = text;
option arch code 93 = unsigned integer 16; # RFC4578
option pac-webui code 252 = text;

default-lease-time 7200;
max-lease-time 86400;
log-facility local7;
one-lease-per-client true;
deny duplicates;
ping-check true;
update-conflict-detection false;
authoritative;
failover peer "dhcp_lan" {
  primary;
  address 192.168.1.10;
  port 519;
  peer address 192.168.1.20;
  peer port 520;
  max-response-delay 10;
  max-unacked-updates 10;
  split 128;
  mclt 600;

  load balance max seconds 3;
}


subnet 192.168.1.0 netmask 255.255.255.0 {
  pool {
    deny dynamic bootp clients;
    failover peer "dhcp_lan";
    range 192.168.1.100 192.168.1.199;
  }

  option routers 192.168.1.1;
 
}


Secondary dhcpd.conf:

option domain-name "localdomain";
option ldap-server code 95 = text;
option arch code 93 = unsigned integer 16; # RFC4578
option pac-webui code 252 = text;

default-lease-time 7200;
max-lease-time 86400;
log-facility local7;
one-lease-per-client true;
deny duplicates;
ping-check true;
update-conflict-detection false;
authoritative;
failover peer "dhcp_lan" {
  secondary;
  address 192.168.1.20;
  port 520;
  peer address 192.168.1.10;
  peer port 519;
  max-response-delay 10;
  max-unacked-updates 10;

  load balance max seconds 3;
}


subnet 192.168.1.0 netmask 255.255.255.0 {
  pool {
    deny dynamic bootp clients;
    failover peer "dhcp_lan";
    range 192.168.1.100 192.168.1.199;
  }

  option routers 192.168.1.1;

}


After starting dhcpd service, both nodes enter recover state. Log entries say
dhcpd[99045] failover peer dhcp_lan: I move from startup to recover

DHCPv4/Leases tab on both nodes says
My State: recover
Peer State: unknown-state


Please help. Thanks!

I have a similar problem.

After searching for a solution, I dicovered that for dhcp failover also port 647 UDP/TCP is needed.
Is this true?

This port is not enabled by the automatic generated rules.

February 05, 2021, 11:47:25 AM #2 Last Edit: February 05, 2021, 01:19:47 PM by ednt
I just tested it and it failed.

I saw also no traffic on port 647 or 847.

Then I checked /var/dhcp/etc/dhcpd.conf
and found out that isc dhcp uses different ports for failover.

In my case 519 and 520.

Ok, the ports where automatically enabled. (519,520)

In our case it was a problem of the NAT outbound rules:

This firewall should not use the VIP address when the destination is in the same net.
So invert destination and use as destination net the net of the rule.

Well it's definitely not the case with my setup - I disabled NAT completely, OPNsense boxes work just as a gateway between LANs. So I'm still confused :(

Quote from: ednt on February 05, 2021, 01:21:54 PM
Ok, the ports where automatically enabled. (519,520)

In our case it was a problem of the NAT outbound rules:

This firewall should not use the VIP address when the destination is in the same net.
So invert destination and use as destination net the net of the rule.

As I am browsing the logs and comparing the configs I think our setup might have the same issue. Sadly I do not understand your solution with the NAT outbound rules. At least the last sentence makes no sense for me.