Hello, since a recent upgrade (22.1.5 ~ 22.1.6) my unbound is unable to deserve the new devices which are added in DHCP leases, my option mark is well checked, but it's impossible to get anything from unbound. (i try to restart unbound/dhcp services, reboot opnsense, without any changes)
There arent any error/warn logs in unbound :-\
(//)
If i try a command (ubuntu) like :
dig +search <hostname_in_dhcp_lease> @<OPNSENSE_IP>
i get
->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 55137
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
Do you have any advice to help me to debug/fix this issue ?
it worked before theses updates.
Check your /var/dhcpd/var/db/dhcpd.leases -- if it's not there it can't possibly be known to the system.
Cheers,
Franco
Hello Franco, thanks for your help,
i've checked this file, and I can see the presence of my lease, it's here
:-[
But is it expired or not. Pasting the entry here would certainly help reduce guesswork.
Cheers,
Franco
Quote from: franco on April 19, 2022, 02:01:29 PM
But is it expired or not. Pasting the entry here would certainly help reduce guesswork.
Cheers,
Franco
I'm experiencing the same issue also just upgraded to 22.1.6. I do find an entry in the DHCP lease file for the missing DNS entry. It shows active and that the lease will expire tomorrow. However I'm unable to resolve it with nslookup from the cli of the firewall or from any other device even after restarting the unbound service, as the OP stated. I've had this issue for some time and posted about it a long time ago, the difference then was restarting the unbound service brought the new entries in, unlike this time, I can't seem to get unbound to bring them in. I can provide evidence but don't want to post publicly.
To reiterate, make sure that the entry exists in
# cat /var/dhcpd/var/db/dhcpd.leases
and is valid (as in not already expired). Then make sure the entry exists in
# cat /var/unbound/dhcpleases.conf
When doing a restart of Unbound there should be no issue that valid entries do not end up in this file if they exist in the former.
Then, redo the testing to resolve the DHCP lease. If it doesn't work but it exists in both files the test was wrong. Sometimes domain mismatches can make this happen...
Cheers,
Franco
Quote from: franco on April 20, 2022, 04:36:30 PM
To reiterate, make sure that the entry exists in
# cat /var/dhcpd/var/db/dhcpd.leases
and is valid (as in not already expired). Then make sure the entry exists in
# cat /var/unbound/dhcpleases.conf
When doing a restart of Unbound there should be no issue that valid entries do not end up in this file if they exist in the former.
Then, redo the testing to resolve the DHCP lease. If it doesn't work but it exists in both files the test was wrong. Sometimes domain mismatches can make this happen...
Cheers,
Franco
I confirmed that the entry exists and is not expired in the
# cat /var/dhcpd/var/db/dhcpd.leases
I however do not see this entry in
# cat /var/unbound/dhcpleases.conf
Even after restarting the service the entry does not show up in the file. I also question if restarting the service should even be necessary? I can provide a screenshot if there is a less public way for me to do so. Just don't want to post up a list of hosts names and IP's on a public forum if it can be avoided.
The lease should show up in dhcpleases.conf once Unbound was restarted. The process to generate the file is:
# pgrep -f unbound_dhcpd.py
(should be up and running by showing a PID)
This process will also log to the Unbound log:
# opnsense-log resolver
Here's how to restart it and watch the log file:
# configctl -d unbound restart && opnsense-log -f resolver
Any errors here related to the host in question?
Cheers,
Franco
Quote from: franco on April 20, 2022, 05:03:08 PM
The lease should show up in dhcpleases.conf once Unbound was restarted. The process to generate the file is:
# pgrep -f unbound_dhcpd.py
(should be up and running by showing a PID)
This process will also log to the Unbound log:
# opnsense-log resolver
Here's how to restart it and watch the log file:
# configctl -d unbound restart && opnsense-log -f resolver
Any errors here related to the host in question?
Cheers,
Franco
Restarted with
# configctl -d unbound restart && opnsense-log -f resolver
no errors were printed related to the host in question, logging level is set to "2" would think that would be enough to show these errors. But can adjust if needed.
I don't know if this is related, but thought it was to note since you mentioned a python script. If I run a health check I show the following missing dependencies.
Checking all packages: .......... done
py37-markupsafe has a missing dependency: python37
py37-markupsafe has a missing dependency: py37-setuptools
py37-markupsafe is missing a required shared library: libpython3.7m.so.1.0
Another follow up, rebooted the firewall this afternoon and the missing entry is now in DNS. So something is different between a reboot and a service restart.
Thanks for your help @FullyBorked, i confirm that i've got exactly the same behaviour
- entry is present and valid in /var/dhcpd/var/db/dhcpd.leases
- no entry in /var/unbound/dhcpleases.conf
Even after dhcp/unbound restart, there aren't any errors in logs.
I've tried to reboot the whole device, but i've got the same behaviour, and no resolution !
But this morning, i've reboot again, and this time, i've got the resolution, something seems to be wrong since the last update of unbound :(
@franco is there anything else I can do to help identify and resolve this? Devices that I don't keep online all the time are frustrating as they seem to need a firewall reboot to be resolvable via DNS.
I have the same issue here. OPNsense version is 22.1.7
(maybe related: https://forum.opnsense.org/index.php?topic=24931.15 (https://forum.opnsense.org/index.php?topic=24931.15) and https://www.reddit.com/r/opnsense/comments/ue91lf/2216_unbound_dhcp_lease_registration_not_working (https://www.reddit.com/r/opnsense/comments/ue91lf/2216_unbound_dhcp_lease_registration_not_working))
What I found so far:
- the DHCP leases in /var/var/dhcpd/var/db/dhcpd.leases are monitored by the unbound dhcpd watcher script /usr/local/opnsense/scripts/dns/unbound_dhcpd.py
- the dhcpd watcher script transfers lease changes to var/unbound/dhcpleases.conf
- if the watcher script is not running, var/unbound/dhcpleases.conf is either stale or empty and name resolution for DHCP clients fails
Please log in to console and check if the script is running
ps aux | grep unbound
or (like franco stated above)
pgrep -f unbound_dhcpd.py
The output should look like
root 43084 0.0 0.2 25880 15192 - Ss 11:49 0:03.51 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain intern (python3.8)
unbound 51803 0.0 0.6 95052 45708 - Ss 11:49 0:01.93 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root 42113 0.0 0.0 12740 2356 0 S+ 11:54 0:00.00 grep unbound
If unbound restarts, the script is restarted as well. Unbound restarts are triggered by config changes, changes of blocklists, LAN and WAN interfaces ups/downs, reboot...
In my case the script was not running. I tested a lot and even set up a brand new machine (Intel Celeron J3160, 1.60GHz, 4 cores, 4 threads) running a fresh OPNsense 22.1.7.
What I have found so far:
- if unbound DNS blocklists are disabled, everything works fine and the dhcp watcher scripts is running
- if unbound DNS blocklists are enabled and every blocklist is checked, the unbound start process takes ~1 minute, the watcher script is started during that process but terminates at some point
- if unbound blocklist are enabled an roughly half the blocklists are checked, everything works fine and the dhcp watcher scripts is running
It looks like the the longer the unbound start process takes, the higher the probability that the watcher script terminates.
Can anybody reproduce this?
P.S.
I was monitoring the start process of unbound frequently with "ps aux|grep unbound". One time I got this:
root@router:~ # ps aux | grep unbound
root 61170 0.0 0.0 12892 2404 - Ss 13:14 0:00.00 /usr/local/bin/flock -n -E 0 -o /tmp/unbound_start.lock /usr/local/opnsense/scripts/unbound/start.sh
root 61287 0.0 0.2 25880 14940 - R 13:14 0:00.26 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain intern (python3.8)
root 61592 0.0 0.0 13500 2856 - S 13:14 0:00.00 /bin/sh /usr/local/opnsense/scripts/unbound/start.sh
root 61729 0.0 0.3 36324 23248 - R 13:14 0:00.26 /usr/local/sbin/unbound-checkconf /var/unbound/unbound.conf
root 61761 0.0 0.2 25880 14936 - Rs 13:14 0:00.03 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain intern (python3.8)
root 62336 0.0 0.0 12740 2356 0 S+ 13:14 0:00.00 grep unbound
2 x "unbound_dhcpd.py" ??
I can confirm the watcher script doesn't appear to be running
ps aux | grep unbound
unbound 15633 0.0 3.8 389068 309596 - Ss 02:00 0:42.63 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root 20113 0.0 0.0 12740 1964 0 S+ 11:31 0:00.00 grep unbound
I can confirm that disabling blacklists in unbound indeed allows the watcher service to start and run
ps aux | grep unbound
root 6264 0.0 0.2 25880 14568 - Ss 11:34 0:00.31 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain techservtn.com (python3.8)
unbound 8438 0.0 0.5 110412 42400 - Ss 11:34 0:00.11 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root 41524 0.0 0.0 12740 1964 0 S+ 11:34 0:00.00 grep unbound
I'm running on a i3-9300 and 8gigs of ram and an SSD. I shouldn't be hardware constrained at all. So maybe one of the blocklists is the problem? I only have 5 block lists enabled.
I've messed with this some, I initially thought maybe this was just due to one of the lists being bad, but enabling one list at a time I can't reproduce the issue. So I'm now starting to think unbound is reaching some max file size issue or something with the block list or some type of timeout parsing a large file. After adding a few larger lists the service won't continue to run, but just one larger list say Adguard List, the service runs without issue.
The quick fix is to reduce the number of blocklists that are enabled. Not sure what the longer term is, not sure if this is a bug or by design, maybe for performance reasons there is a cap on the number of blocks in the blocklist?
Also adding monitoring in monit so I'll know if the python script process dies.
Nice to have this cross checked. Thanks @FullyBorked.
I rather consider it a bug, because it severly limits functionality of DNS blocklists.
Do you know if there is any (python) log to look for debug information?
Cheers, Marko