[22.1.6] DHCP leases not registered in Unbound

Started by Totoooool, April 18, 2022, 08:32:50 PM

Previous topic - Next topic
Hello, since a recent upgrade (22.1.5 ~ 22.1.6) my unbound is unable to deserve the new devices which are added in DHCP leases, my option mark is well checked, but it's impossible to get anything from unbound. (i try to restart unbound/dhcp services, reboot opnsense, without any changes)
There arent any error/warn logs in unbound  :-\


If i try a command (ubuntu) like :

dig +search <hostname_in_dhcp_lease> @<OPNSENSE_IP>


i get

->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 55137
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1


Do you have any advice to help me to debug/fix this issue ?
it worked before theses updates.

Check your /var/dhcpd/var/db/dhcpd.leases -- if it's not there it can't possibly be known to the system.


Cheers,
Franco

Hello Franco, thanks for your help,
i've checked this file, and I can see the presence of my lease, it's here
:-[

But is it expired or not. Pasting the entry here would certainly help reduce guesswork.


Cheers,
Franco

April 20, 2022, 02:54:09 AM #4 Last Edit: April 20, 2022, 02:56:57 AM by FullyBorked
Quote from: franco on April 19, 2022, 02:01:29 PM
But is it expired or not. Pasting the entry here would certainly help reduce guesswork.


Cheers,
Franco

I'm experiencing the same issue also just upgraded to 22.1.6.  I do find an entry in the DHCP lease file for the missing DNS entry.  It shows active and that the lease will expire tomorrow.  However I'm unable to resolve it with nslookup from the cli of the firewall or from any other device even after restarting the unbound service, as the OP stated.  I've had this issue for some time and posted about it a long time ago, the difference then was restarting the unbound service brought the new entries in, unlike this time, I can't seem to get unbound to bring them in.  I can provide evidence but don't want to post publicly.

To reiterate, make sure that the entry exists in

# cat /var/dhcpd/var/db/dhcpd.leases

and is valid (as in not already expired). Then make sure the entry exists in

# cat /var/unbound/dhcpleases.conf

When doing a restart of Unbound there should be no issue that valid entries do not end up in this file if they exist in the former.

Then, redo the testing to resolve the DHCP lease. If it doesn't work but it exists in both files the test was wrong. Sometimes domain mismatches can make this happen...


Cheers,
Franco

Quote from: franco on April 20, 2022, 04:36:30 PM
To reiterate, make sure that the entry exists in

# cat /var/dhcpd/var/db/dhcpd.leases

and is valid (as in not already expired). Then make sure the entry exists in

# cat /var/unbound/dhcpleases.conf

When doing a restart of Unbound there should be no issue that valid entries do not end up in this file if they exist in the former.

Then, redo the testing to resolve the DHCP lease. If it doesn't work but it exists in both files the test was wrong. Sometimes domain mismatches can make this happen...


Cheers,
Franco

I confirmed that the entry exists and is not expired in the

# cat /var/dhcpd/var/db/dhcpd.leases

I however do not see this entry in

# cat /var/unbound/dhcpleases.conf

Even after restarting the service the entry does not show up in the file. I also question if restarting the service should even be necessary?  I can provide a screenshot if there is a less public way for me to do so.  Just don't want to post up a list of hosts names and IP's on a public forum if it can be avoided. 

The lease should show up in dhcpleases.conf once Unbound was restarted. The process to generate the file is:

# pgrep -f unbound_dhcpd.py

(should be up and running by showing a PID)

This process will also log to the Unbound log:

# opnsense-log resolver

Here's how to restart it and watch the log file:

# configctl -d unbound restart && opnsense-log -f resolver

Any errors here related to the host in question?


Cheers,
Franco

Quote from: franco on April 20, 2022, 05:03:08 PM
The lease should show up in dhcpleases.conf once Unbound was restarted. The process to generate the file is:

# pgrep -f unbound_dhcpd.py

(should be up and running by showing a PID)

This process will also log to the Unbound log:

# opnsense-log resolver

Here's how to restart it and watch the log file:

# configctl -d unbound restart && opnsense-log -f resolver

Any errors here related to the host in question?


Cheers,
Franco

Restarted with # configctl -d unbound restart && opnsense-log -f resolver no errors were printed related to the host in question, logging level is set to "2" would think that would be enough to show these errors. But can adjust if needed.

I don't know if this is related, but thought it was to note since you mentioned a python script.  If I run a health check I show the following missing dependencies.

Checking all packages: .......... done
py37-markupsafe has a missing dependency: python37
py37-markupsafe has a missing dependency: py37-setuptools
py37-markupsafe is missing a required shared library: libpython3.7m.so.1.0

Another follow up, rebooted the firewall this afternoon and the missing entry is now in DNS.  So something is different between a reboot and a service restart. 

Thanks for your help @FullyBorked, i confirm that i've got exactly the same behaviour
- entry is present and valid in /var/dhcpd/var/db/dhcpd.leases
- no entry in /var/unbound/dhcpleases.conf

Even after dhcp/unbound restart, there aren't any errors in logs.
I've tried to reboot the whole device, but i've got the same behaviour, and no resolution !

But this morning, i've reboot again, and this time, i've got the resolution, something seems to be wrong since the last update of unbound  :(

@franco is there anything else I can do to help identify and resolve this?  Devices that I don't keep online all the time are frustrating as they seem to need a firewall reboot to be resolvable via DNS. 

I have the same issue here. OPNsense version is 22.1.7

(maybe related: https://forum.opnsense.org/index.php?topic=24931.15 and https://www.reddit.com/r/opnsense/comments/ue91lf/2216_unbound_dhcp_lease_registration_not_working)

What I found so far:
- the DHCP leases in /var/var/dhcpd/var/db/dhcpd.leases are monitored by the unbound dhcpd watcher script /usr/local/opnsense/scripts/dns/unbound_dhcpd.py
- the dhcpd watcher script transfers lease changes to var/unbound/dhcpleases.conf
- if the watcher script is not running, var/unbound/dhcpleases.conf is either stale or empty and name resolution for DHCP clients fails

Please log in to console and check if the script is running
ps aux | grep unbound
or (like franco stated above)
pgrep -f unbound_dhcpd.py

The output should look like

root    43084   0.0  0.2 25880 15192  -  Ss   11:49      0:03.51 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain intern (python3.8)
unbound 51803   0.0  0.6 95052 45708  -  Ss   11:49      0:01.93 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root    42113   0.0  0.0 12740  2356  0  S+   11:54      0:00.00 grep unbound


If  unbound restarts, the script is restarted as well. Unbound restarts are triggered by config changes, changes of blocklists, LAN and WAN interfaces ups/downs, reboot...

In my case the script was not running. I tested a lot and even set up a brand new machine (Intel Celeron J3160, 1.60GHz, 4 cores, 4 threads) running a fresh OPNsense 22.1.7.

What I have found so far:
- if unbound DNS blocklists are disabled, everything works fine and the dhcp watcher scripts is running
- if unbound DNS blocklists are enabled and every blocklist is checked, the unbound start process takes ~1 minute, the watcher script is started during that process but terminates at some point
- if unbound blocklist are enabled an roughly half the blocklists are checked, everything works fine and the dhcp watcher scripts is running

It looks like the the longer the unbound start process takes, the higher the probability that the watcher script terminates.

Can anybody reproduce this?

P.S.
I was monitoring the start process of unbound frequently with "ps aux|grep unbound". One time I got this:

root@router:~ # ps aux | grep unbound
root  61170   0.0  0.0 12892  2404  -  Ss   13:14     0:00.00 /usr/local/bin/flock -n -E 0 -o /tmp/unbound_start.lock /usr/local/opnsense/scripts/unbound/start.sh
root  61287   0.0  0.2 25880 14940  -  R    13:14     0:00.26 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain intern (python3.8)
root  61592   0.0  0.0 13500  2856  -  S    13:14     0:00.00 /bin/sh /usr/local/opnsense/scripts/unbound/start.sh
root  61729   0.0  0.3 36324 23248  -  R    13:14     0:00.26 /usr/local/sbin/unbound-checkconf /var/unbound/unbound.conf
root  61761   0.0  0.2 25880 14936  -  Rs   13:14     0:00.03 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain intern (python3.8)
root  62336   0.0  0.0 12740  2356  0  S+   13:14     0:00.00 grep unbound


2 x "unbound_dhcpd.py" ??

I can confirm the watcher script doesn't appear to be running

ps aux | grep unbound
unbound       15633   0.0  3.8  389068  309596  -  Ss   02:00       0:42.63 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root          20113   0.0  0.0   12740    1964  0  S+   11:31       0:00.00 grep unbound


I can confirm that disabling blacklists in unbound indeed allows the watcher service to start and run

ps aux | grep unbound
root           6264   0.0  0.2   25880   14568  -  Ss   11:34       0:00.31 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain techservtn.com (python3.8)
unbound        8438   0.0  0.5  110412   42400  -  Ss   11:34       0:00.11 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root          41524   0.0  0.0   12740    1964  0  S+   11:34       0:00.00 grep unbound


I'm running on a i3-9300 and 8gigs of ram and an SSD.  I shouldn't be hardware constrained at all.  So maybe one of the blocklists is the problem?  I only have 5 block lists enabled.