Unbound forgets hostname override after a few days. How to debug?

Started by tangofan, January 07, 2025, 03:20:17 AM

Previous topic - Next topic
I am using OPNsense 24.10.11_2 on bare metal as my router, with ISC (DHCPv4) as my DHCP server and built-in Unbound as my DNS server. For my NAS I have configured a static IP address at 192.168.101.20 with hostname diskstation in ISC. The corresponding subnet 192.168.101.0/24 is a VLAN on the LAN port of my router.

In Unbound's General settings I checked "Register DHCP Static Mappings" and "Do not register system A/AAAA records" and in Unbound's Overrides I configured the following entry:

host:   diskstation
domain: <mydomain.net>
Type:   A (IPv4 address)
Value:  192.168.101.20

That works fine for the most part, except every couple of days Unbound suddenly starts resolving the host name "diskstation" to my WAN address on my Windows 10 client, as if the override didn't exist. In other words, when I ping "diskstation" I get my WAN address in return, but I can still ping 192.168.101.20 and receive a response from my NAS. But the override is still there, at least in the OPNsense UI.

When I disable (+save) and then reenable (+save) the Unbound override in the OPNsense UI, the problem goes away and - once again - the host name "diskstation" will resolve to 192.168.101.20, at least until the next time Unbound goes a bit senile. Still this is a bit of a nuisance, so I'm wondering, what information from my network and/or OPNsense I should gather the next time this happens, so the issue can be tracked down properly (and hopefully resolved in the future).

Thanks in advance for any guidance on this.

January 07, 2025, 06:20:13 AM #1 Last Edit: January 07, 2025, 06:52:43 AM by EricPerl Reason: Ran out of battery
Following this because I have an outstanding issue that might be related.

Did you mean WLAN address?
The output of ping on Windows can be weird in case it fails DNS resolution.
I've seen such cases where the machine IP is displayed...

If you suspect DNS issues, use dnslookup diskstation and then dnslookup diskstation <your DNS server> and see what they return.

Side note: why bother with "Register DHCP static mappings" if you're using host overrides?
I sometimes override the hostname in the DHCP reservation (when I can't change it on the host) so it makes more sense.

Quote from: EricPerl on January 07, 2025, 06:20:13 AMFollowing this because I have an outstanding issue that might be related.

Did you mean WLAN address?
Eric, thanks so much for your response.

No, I indeed meant the WAN address, that is the address that my internet provider assigns to the WAN interface of OPNsense and through which my network is reachable over the internet.
WLAN = Wifi, correct, so that would be just like a LAN address. Or perhaps I'm getting my acronyms mixed up?

QuoteThe output of ping on Windows can be weird in case it fails DNS resolution.
I've seen such cases where the machine IP is displayed...

If you suspect DNS issues, use
dnslookup diskstation and then
dnslookup diskstation <your DNS server> and see what they return.
I used nslookup in the past, when this problem occurred, and it did return my WAN address. After I "fixed" this as described above, it returned my LAN address once again (and also updated my local DNS cache to the LAN address).

QuoteSide note: why bother with "Register DHCP static mappings" if you're using host overrides?
I sometimes override the hostname in the DHCP reservation (when I can't change it on the host) so it makes more sense.
This is an artifact of my configuration attempts. Initially I just used DHCP static mappings, but it didn't resolve my diskstation to a LAN address, until I implemented the override. Perhaps I need to uncheck "Do not register system A/AAAA records" for that to work... At the time I just was happy that I got things to work. But you have a point and I should revisit that at some point to clean it up.

The short version of what I tried to say is to use the appropriate tool to diagnose.
If you have a DNS issue, use nslookup. Again, Windows ping can get weird if a name is supplied and the DNS server is unreachable or returns an error.

nslookup host performs the lookup using the default DNS server.
nslookup host dnsserverIP performs the lookup using the specified DNS server.
If the results differ, your default DNS is likely not the specified server...

Either of these returning the OPN WAN IP is odd.
The fact that fixing something in Unbound gets you back to a good state would suggest Unbound was to blame.
But you need to dig while the system is in a bad state... You might have to enable query logs to make progress.

You might want to be a little more precise with your description. For example, "LAN address" is vague. Which machine's LAN address? OPN's? The looked up host's? The machine executing the lookup?

A host override in Unbound (name to IP mapping) works better if the IP is stable (static IP on host or DHCP reservation).
But you don't need to bother Unbound with DHCP reservations (MAC/clientID to IP mapping) which seem to require Unbound to restart anyway.
That setting is useful if Unbound entirely relies on DHCP reservations.

Quote from: EricPerl on January 07, 2025, 11:08:17 PMThe short version of what I tried to say is to use the appropriate tool to diagnose.
If you have a DNS issue, use nslookup. Again, Windows ping can get weird if a name is supplied and the DNS server is unreachable or returns an error.

nslookup host performs the lookup using the default DNS server.
nslookup host dnsserverIP performs the lookup using the specified DNS server.
If the results differ, your default DNS is likely not the specified server...

Either of these returning the OPN WAN IP is odd.
The fact that fixing something in Unbound gets you back to a good state would suggest Unbound was to blame.
But you need to dig while the system is in a bad state... You might have to enable query logs to make progress.
Thanks again for your reply. You make an excellent point, I'll change my shares to the NAS from other computers to use the NAS' IP address (instead of the hostname), so my setup is not affected when this problem happens again. That way I'll have some time to troubleshoot the issue, when it occurs again.

Is there a particular thing I should do right now BEFORE the problem happens again, e.g. increase the log level of some log?

QuoteYou might want to be a little more precise with your description. For example, "LAN address" is vague. Which machine's LAN address? OPN's? The looked up host's? The machine executing the lookup?
I thought I was precise, but it seems that I was mistaken. This was only ever about the IP address that is assigned to the hostname ("diskstation") of my NAS device by the default name server (OPNsense) in my network.
QuoteA host override in Unbound (name to IP mapping) works better if the IP is stable (static IP on host or DHCP reservation).
As mentioned in my original post my NAS has indeed a static IPv4 address configured in ISC (192.168.101.20).
QuoteBut you don't need to bother Unbound with DHCP reservations (MAC/clientID to IP mapping) which seem to require Unbound to restart anyway.
That setting is useful if Unbound entirely relies on DHCP reservations.
I'm not sure what you mean by that. I'm not doing any DHCP reservations in Unbound, only in ISC. The only thing I'm doing in Unbound is the Host Override.

For example:
QuoteI used nslookup in the past, when this problem occurred, and it did return my WAN address. After I "fixed" this as described above, it returned my LAN address once again (and also updated my local DNS cache to the LAN address).
By "my LAN address", you probably mean "OPN's WAN address"
By "my LAN address", you probably mean "Diskstation's LAN address"
I'm making assumptions. I might be wrong and then we're talking past each other...

With regards to "Register DHCP Static Mappings" setting in ISC:
As long as you do a host override in ISC (with IP equal to the static mapping), I don't understand the benefit of turning this on, because ISC already has all the information it needs to handle the DNS request.

If you specified a hostname in the DHCP static mapping (to override the one the NAS specifies), then this setting would enable ISC to become aware of this hostname to IP mapping. As mentioned before, it's my understanding ISC needs to be restarted after static mappings are updated (if they are relevant to ISC).
This said, a domain name will likely be added to hostname. Either OPN's domain name, or the domain name for the interface.

I have this unchecked: "Do not register system A/AAAA records". I have experimented enough with it to fully understand what that does, but I don't change defaults until I have to, and I didn't have to...

Quote from: EricPerl on January 08, 2025, 09:00:12 AMFor example:
QuoteI used nslookup in the past, when this problem occurred, and it did return my WAN address. After I "fixed" this as described above, it returned my LAN address once again (and also updated my local DNS cache to the LAN address).
By "my LAN address", you probably mean "OPN's WAN address"
By "my LAN address", you probably mean "Diskstation's LAN address"
I'm making assumptions. I might be wrong and then we're talking past each other...
Eric, thanks so much for bearing with me and responding again. Yes, you're absolutely right, that sentence slipped through the cracks and is totally confusing.

It should have properly said:

  • I ran "nslookup diskstation" (on my Windows 10 PC) in the past, when this problem occurred, and it did return my WAN address. After I "fixed" this as described above, running another "nslookup diskstation" on my Windows 10 PC then returned the LAN address of my NAS (192.168.101.20) (and also updated the local DNS cache of my Windows Desktop PC to the LAN address of the NAS, when I use "ping diskstation").

You were also wondering, why nslookup would ever return my WAN address and I forgot to respond to that in my previous post. The reason is that for my domain <mydomain>.net I have configured a wildcard subdomain with Cloudflare, who provides the DNS resolution. This is what the A and CNAME records at Cloudflare for <mydomain>.net look like:
Type     Name            Content
A        ipv4            <ip-address>
CNAME    *               <mydomain>.net
CNAME    <mydomain>.net  ipv4.<mydomain>.net

(FWIW Cloudflare uses CNAME flattening to make the 2nd CNAME entry possible.)
If you are wondering, why this weird configuration with the ipv4 subdomain, this is because I don't have a static ipv4 address, so I use a DDNS service to set my ipv4 address dynamically and that works just better, if I do this on a subdomain.

Quote from: EricPerl on January 08, 2025, 09:00:12 AMWith regards to "Register DHCP Static Mappings" setting in ISC:
As long as you do a host override in ISC (with IP equal to the static mapping), I don't understand the benefit of turning this on, because ISC already has all the information it needs to handle the DNS request.
Please correct me, if I am wrong, but I had thought that ISC is the DHCP server and Unbound is the DNS server and that this flag is necessary, so ISC registers the hostname and its static ip address with Unbound, thus enabling Unbound to resolve the hostname to the associated static IP address.
(Spoiler: This setting is indeed necessary, see my experiment below".)
QuoteIf you specified a hostname in the DHCP static mapping (to override the one the NAS specifies), then this setting would enable ISC to become aware of this hostname to IP mapping. As mentioned before, it's my understanding ISC needs to be restarted after static mappings are updated (if they are relevant to ISC).
This said, a domain name will likely be added to hostname. Either OPN's domain name, or the domain name for the interface.

I have this unchecked: "Do not register system A/AAAA records". I have experimented enough with it to fully understand what that does, but I don't change defaults until I have to, and I didn't have to...
Ok, this motivated me to do a few experiments, this time using a Debian/Proxmox system (running Debian 12 [bookworm]) with the NIC's ip address 192.168.101.60) as my client to check the nslookup results.

I disabled the overrides in "Services: Unbound DNS: Overrides", activated the changes and then in "Services: Unbound DNS: General" I changed the settings to the following and then restarted the Unbound service:

Register ISC DHCP4 Leases: Checked
Register DHCP Static Mappings: Unchecked
Do not register system A/AAAA records: Unchecked

Then I ran nslookup on my Debian/Proxmox system:
root@prx-prod-01:~# nslookup diskstation
Server:         192.168.101.1
Address:        192.168.101.1#53

Non-authoritative answer:
Name:   diskstation.<mydomain>.net
Address: <my WAN IP address>

root@prx-prod-01:~#

So those settings didn't yield the desired result. Next attempt with "Register DHCP static mappings" enabled (followed of course by a restart of the Unbound service):
Register ISC DHCP4 Leases: Checked
Register DHCP Static Mappings: CHECKED
Do not register system A/AAAA records: Unchecked
Now this one works properly (remember the override is disabled):
root@prx-prod-01:~# nslookup diskstation
Server:         192.168.101.1
Address:        192.168.101.1#53

Name:   diskstation.<mydomain>.net
Address: 192.168.101.20

root@prx-prod-01:~#

It works!!! I'm not sure, why this didn't work, when I initially set it up, but I'm happy to see that it works now. Hopefully this will avoid the need for any future overrides. So it seems that activating the setting "Register DHCP Static Mappings" is indeed necessary.

I suppose we shall see if the issue with "nslookup diskstation" resolving to my WAN address will popup again in the future. For the moment though I'm happy with the result and I think that my overall configuration improved because of your feedback, @EricPerl. So thank you very much again for that.

Yes, ISC is DHCP (essentially ClientID/MAC -> IP mappings) and Unbound is DNS (name -> IP mappings). Some mappings static, others more dynamic.

Let me put it another way:
If you have a host override in Unbound (diskstation = 192.168.101.20), Unbound does NOT need any other data to do its job (lookup diskstation -> 192.168.101.20).
If the static IP was set on the host, there would actually be no data in ISC related to that host.

If you instead rely on the host data in ISC (ClientID/MAC -> IP+hostname[+domain]), then clearly Unbound needs to obtain that data to perform "lookup diskstation".
That's where "Register DHCP Static Mappings" comes into play. There's a separate setting for dynamic leases.

The existence of the *.mydomain.net indeed explains why OPN's WAN IP was returned.
In absence of a specific diskstation.mydomain.net -> 192.168.101.20 mapping, Unbound ends up using the wildcard domain name, which is your WAN IP.

You're not using a fully qualified domain name (FQDN), and just perform a lookup on the hostname alone.
You've become a little more dependent on the app you're using and another setting on the client machine ("search domains", a list of domains to be appended to input when doing lookups, set via DHCP (at the interface level) or in the DNS resolver.
IIRC "nslookup diskstation" will actually append one search domain at a time, in order, before making the request.
So I suspect your "search domains" starts with mydomain.net.
That's why all your lookups end up being for diskstation.mydomain.net (thus triggering the wildcard name when the FQDN is not found).

I don't know if your Unbound override was for diskstation or diskstation.mydomain.net
Now without the Unbound override, it gets data from ISC, which likely gets a domain name from the static mapping, or domain at the interface level, or OPN's domain...

To add some tidbits: When you do a static assignment via ISC DHCP and allow Unbound to use the static mappings, they still only get registered once you restart Unbound. What is even worse is that when you initially connected a machine via dynamic DHCP and then change it to a static assignment (which should be outside of the dynamic ranges), both the lease and the static assigment are registered and you can get the wrong IP for the name. I am not even sure if this situation is cleared by the lease invalidation.

It is a pity that this can only be fixed be editing the created files manually and restarting Unbound.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

January 09, 2025, 12:50:38 AM #9 Last Edit: January 09, 2025, 07:04:29 AM by Patrick M. Hausen
Quote from: meyergru on January 09, 2025, 12:33:01 AMWhen you do a static assignment via ISC DHCP and allow Unbound to use the static mappings, they still only get registered once you restart Unbound.

This was as far as I know a user community request with somewhat overwhelming support. Reasoning that a restart of Unbound and hence an interruption of DNS service every time an admin added a static lease was unacceptable.

You cannot have both - until someone implements a true RFC 2136 based mechanism from DHCP to e.g. BIND.

Honestly I would not want Franco's or Ad's job in this particular area. ISC DHCPd was "the DHCP server" just like BIND was "the DNS server". There's an O'Reilly book titled "DNS and BIND".

Now we have a migration to Kea on the horizon, Unbound, Dnsmasq, users of Dnsmasq request leveraging its DHCP capabilties in OPNsense ...

For crying out loud, can we settle on one product for each function? Apparently not.

I'd be happy with Kea and BIND. With full integration and all features exposed in the UI, of course.

Kind regards,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

QuoteFor crying out loud, can we settle on one product for each function? Apparently not.
LOL
Isn't this the Unix/Linux way?
Coming from a Windows background, that was mindboggling when I started a new job at a mostly Linux company.
Every single wiki article was an endless series of if you're using do that, else if ... (distro, shell, tool, versions and so on).

Quote from: Patrick M. Hausen on January 09, 2025, 12:50:38 AM
Quote from: meyergru on January 09, 2025, 12:33:01 AMWhen you do a static assignment via ISC DHCP and allow Unbound to use the static mappings, they still only get registered once you restart Unbound.

This was as far as I know a user community request with somewhat overwhelming support. Reasoning that a restart of Unbound and hence an interruption of DNS service every time an admin added a static lease was unacceptable.

You cannot have both - until someone implements a true RFC 2136 based mechanism from DHCP to e.g. BIND.

I severly doubt that this is neccessary (or a problem): AFAIR, when a dynamic lease is issue, the name is being registered immediately.

Static leases are kept in /var/unbound/host_entries.conf and dynamic leases are in /var/unbound/dhcpleases.conf. Both files are just included in unbound.conf with no specific notion. So either both or none should lead to an unbound restart - or better, reload, as unbound apparently has a seamless reload function.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

@meyergru thinking about it I guess you are right. But the release notes for 24.7 contain this:

QuoteISC DHCP will no longer reload DNS services on static mapping edits. This is for feature parity with Kea DHCP and avoiding cross-service complications. If you expect your static mappings to show up in a DNS service please restart it manually.

No idea what to make of this, then.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Back-porting of Kea idiosyncracies, then? I guess that Unbounds settings of "Register ISC DHCP4 Leases" indicates that even Kea dynamic leases are never registered, but I have not yet bothered for different reasons.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Unfortunately I'm still having trouble with Unbound on my OPNsense installation, this time it all the sudden was unable to resolve public hostnames after running fine since yesterday. So the symptom is different, but it's still Unbound acting up after running fine for a while (though this time the "while" only lasted about a day).

I ran the tests below from my Proxmox (Debian) box at 192.168.101.60 (hostname: prx-prod-01), with OPNsense as gateway/router and name server at 192.168.101.1. 192.168.101.0/24 is a VLAN on the LAN port of my OPNsense router.

root@prx-prod-01:~# nslookup opnsense.org
;; Got SERVFAIL reply from 192.168.101.1
Server:         192.168.101.1
Address:        192.168.101.1#53

** server can't find opnsense.org: SERVFAIL

root@prx-prod-01:~# nslookup opnsense.org 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   opnsense.org
Address: 178.162.131.118
Name:   opnsense.org
Address: 2001:1af8:4700:a1fa:3::2

root@prx-prod-01:~# nslookup google.com
;; Got SERVFAIL reply from 192.168.101.1
Server:         192.168.101.1
Address:        192.168.101.1#53

** server can't find google.com: SERVFAIL

root@prx-prod-01:~# nslookup google.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   google.com
Address: 142.251.46.206
Name:   google.com
Address: 2607:f8b0:4005:813::200e

root@prx-prod-01:~# ping 9.9.9.9
PING 9.9.9.9 (9.9.9.9) 56(84) bytes of data.
64 bytes from 9.9.9.9: icmp_seq=1 ttl=56 time=8.41 ms
64 bytes from 9.9.9.9: icmp_seq=2 ttl=56 time=11.1 ms
^C
--- 9.9.9.9 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 8.414/9.745/11.076/1.331 ms

Note: I pinged 9.9.9.9 above, because I have configured DNS over TLS from quad9 with the settings below. (I now realize that I should have used 9.9.9.9 as DNS server in my nslookup query as well. Oh, well, next time...)

Enabled  Domain   Address          Port   Hostname
  X               9.9.9.9           853   dns.quad9.net
  X               149.112.112.112   853   dns.quad9.net

After restarting the Unbound Service on my OPNsense box, things worked again:
root@prx-prod-01:~# nslookup google.com
Server:         192.168.101.1
Address:        192.168.101.1#53

Non-authoritative answer:
Name:   google.com
Address: 142.251.46.206
Name:   google.com
Address: 2607:f8b0:4005:812::200e

root@prx-prod-01:~#

I checked the logs at Services-> Unbound DNS -> Log File, but didn't notice anything re. my queries, except for a few "dhcpd expired" Notices for two Apple devices in my household, which do not have static ip addresses.

FWIW I have now enabled the following options in Services-> Unbound DNS -> Advanced (and then restarted the daemon):

Log Queries: Enabled
Log Replies: Enabled
Tag Queries and Replies: Enabled
Log SERVFAIL: Enabled

Is there anything else I should be doing in anticipation of the next occurrence of this problem? If it happens again, what should I be checking for?