Ran into an interesting situation yesterday. Was researching switches and when I attempted to go to www.trendnet.com it didn't work. Unbound was returning NXDOMAIN. trendnet.com resolved, but since it redirects to www.trendnet.com I still couldn't get to the site.
After a bit of troubleshooting, here's what appears to have happened. One of the IPv4 resolvers of Quad9 wasn't resolving the domain. The other IPv4 resolver and both IPv6 resolvers both worked correctly. Looking through the Unbound logs, the only reference I can find is where the problem resolver shows "nodata proof failed" for the domain.
I have Quad9 set up via DOT and DNSSEC support turned on. While digging through Quad9 site, they mention that they don't recommend enabling DNSSEC as it can cause false BOGUS responses. https://docs.quad9.net/Quad9_For_Organizations/DNS_Forwarder_Best_Practices/#disable-dnssec-validation Turning off DNSSEC did allow Unbound to start returning an IP for the domain.
Today the oddness continues. Now the other IPv4 resolver isn't returning a result for the domain. But only on 53. DoT returns a valid IP. Delv shows that the NXDOMAIN is a valid result. I'm not sure how to get it to validate DoT.
; <<>> DiG 9.18.18-0ubuntu2.1-Ubuntu <<>> @9.9.9.9 www.trendnet.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 11260
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.trendnet.com. IN A
;; AUTHORITY SECTION:
trendnet.com. 2257 IN SOA NS65.WORLDNIC.com. namehost.WORLDNIC.com. 123110920 10800 3600 604800 3600
;; Query time: 15 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Sat Feb 17 09:41:11 EST 2024
;; MSG SIZE rcvd: 104
;; resolution failed: ncache nxdomain
; negative response, fully validated
; www.trendnet.com. 2426 IN \-ANY ;-$NXDOMAIN
; trendnet.com. SOA NS65.WORLDNIC.com. namehost.WORLDNIC.com. 123110920 10800 3600 604800 3600
; trendnet.com. RRSIG SOA ...
; trendnet.com. RRSIG NSEC ...
; trendnet.com. NSEC trendnet.com. A NS SOA MX TXT RRSIG NSEC DNSKEY CAA
; <<>> DiG 9.18.18-0ubuntu2.1-Ubuntu <<>> @9.9.9.9 www.trendnet.com +tls
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64722
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.trendnet.com. IN A
;; ANSWER SECTION:
www.trendnet.com. 6704 IN A 38.122.20.251
;; Query time: 35 msec
;; SERVER: 9.9.9.9#853(9.9.9.9) (TLS)
;; WHEN: Sat Feb 17 09:42:22 EST 2024
;; MSG SIZE rcvd: 61
I'm not sure what mechanism Unbound uses to choose from the results provided by upstream resolvers, especially when DNSSEC is enabled. I have the validation logging set to 2 but the general log at the default 1 and there's not anything indicating what the issue is. I've increased the logging to the maximum to see if I can determine the cause.
Has anyone else encountered something similar or do you have any additionally troubleshooting suggestions? Unfortunately, it's a bit of a moving target as I don't know why or when Quad9 is going to return a valid result or not.
Hi
I have experienced just about the exact same symptoms today - also with quad9.
I'll look into it tomorrow and get back to you about my results...
Quote from: holunde on February 19, 2024, 09:33:35 PM
Hi
I have experienced just about the exact same symptoms today - also with quad9.
I'll look into it tomorrow and get back to you about my results...
I've already reached out to Quad9 and they're investigating the issue. My concern here is trying to understand what Unbound is doing and why it would be picking NXDOMAIN instead of a valid result when everything appears to have a proper DNSSEC response, yet picking the valid IP when DNSSEC is disabled.
I could be totally out of target but maybe will give something to at least discard.
I remember vaguely that with quad9 there is a field they return for nxdomain responses to discern between non-existing domain and one that is blocked by their filters, or maybe was for another reason. I think it was authority 1 and 0.
So, in both cases an nxdomain is returned to Unbound.
Perhaps an additional checkbox to disable aggressive-nsec (enabled by default) would be useful..
https://unbound.docs.nlnetlabs.nl/en/latest/topics/privacy/aggressive-nsec.html
https://github.com/NLnetLabs/unbound/issues/824
I think so Fright, at least it would allow to more easily diagnose.
Quote from: cookiemonster on February 20, 2024, 03:19:58 PM
I could be totally out of target but maybe will give something to at least discard.
I remember vaguely that with quad9 there is a field they return for nxdomain responses to discern between non-existing domain and one that is blocked by their filters, or maybe was for another reason. I think it was authority 1 and 0.
So, in both cases an nxdomain is returned to Unbound.
Interesting. I'll have to take a look. Any recommendations for the best way to test for that?
Quote from: Fright on February 20, 2024, 05:56:17 PM
Perhaps an additional checkbox to disable aggressive-nsec (enabled by default) would be useful..
https://unbound.docs.nlnetlabs.nl/en/latest/topics/privacy/aggressive-nsec.html
https://github.com/NLnetLabs/unbound/issues/824
That's interesting, but I don't think that it's due to aggressive-nsec as trendnet.com was correctly resolving. It's the subdomain that doesn't.
From what I can tell, Unbound still makes the resolver call but caches the NXDOMAIN result instead of the valid IP.
Quote from: CJ on February 21, 2024, 09:13:37 PM
Quote from: cookiemonster on February 20, 2024, 03:19:58 PM
I could be totally out of target but maybe will give something to at least discard.
I remember vaguely that with quad9 there is a field they return for nxdomain responses to discern between non-existing domain and one that is blocked by their filters, or maybe was for another reason. I think it was authority 1 and 0.
So, in both cases an nxdomain is returned to Unbound.
Interesting. I'll have to take a look. Any recommendations for the best way to test for that?
yes, with dig. Found a link https://docs.quad9.net/FAQs/
Quote from: cookiemonster on February 21, 2024, 11:23:14 PM
yes, with dig. Found a link https://docs.quad9.net/FAQs/
I'm assuming you're referring to this section. https://docs.quad9.net/FAQs/#identifying-a-quad9-block
I'll give that a try when I have some time and also see if I still have my notes from the original situation.
I should have mentioned that, but yes, that's it.
I just checked and it's not a blocking problem.
dig @9.9.9.9 A www.trendnet.com +dnssec | grep status
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11539
dig @149.112.112.112 A www.trendnet.com +dnssec | grep status
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24981
dig @9.9.9.9 www.trendnet.com | grep "status\|AUTHORITY:"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28429
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
dig @149.112.112.112 www.trendnet.com | grep "status\|AUTHORITY:"
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 38281
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
Both resolvers return valid DNSSEC but one gives NXDOMAIN and the other actually resolves. And the NXDOMAIN isn't a block NXDOMAIN as that would not have the ra or Authority 1.
Quad9 asked me to run some commands in their original response but haven't replied since. I'm going to try poking them again to see what the current status is. But as I mentioned earlier, I'm curious why Unbound is picking the NXDOMAIN instead of the valid result as they both have valid DNSSEC. Simply unchecking DNSSEC causes it to return the IP instead of the NXDOMAIN.
It doesn't seem to be a timing issue as it's repeatable. Something in Unbound just seems to prefer NXDOMAIN when DNSSEC is enabled and there's a conflict.
@CJ
Quotehat's interesting, but I don't think that it's due to aggressive-nsec as trendnet.com was correctly resolving.
yep, trendnet.com was correctly resolving
but it also returns the NSEC record that says that the next record is trendnet.com. so it actualy completing the chain.
this is an error in the trendnet.com zone config that provokes NXDOMAIN for subdomains if aggressive-nsec is on
Quote from: Fright on February 24, 2024, 06:59:56 PM
@CJ
Quotehat's interesting, but I don't think that it's due to aggressive-nsec as trendnet.com was correctly resolving.
yep, trendnet.com was correctly resolving but it also returns the NSEC record that says that the next record is trendnet.com. so it actualy completing the chain.
this is an error in the trendnet.com zone config that provokes NXDOMAIN for subdomains if aggressive-nsec is on
Can you elaborate some? I'm not completely following you.
have you tried with aggressive-nsec no :) ?
i think it should work like https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/
so if you dig up the nsec record for trendnet.com it should return something other then trendnet.com for the next record (actual name or some white lie if zone owner is afraid of zone enumeration). like
https://digwebinterface.com/?hostnames=%0D%0Acloudflare.com&type=NSEC&showcommand=on&ns=resolver&useresolver=9.9.9.9&nameservers=
(returns \000.cloudflare.com. for the next record)
but for trendnet.com it return:
https://digwebinterface.com/?hostnames=trendnet.com&type=NSEC&showcommand=on&ns=resolver&useresolver=9.9.9.9&nameservers=
(returns trendnet.com. for the next record)
actualy saying that there is no records between trendnet.com and trendnet.com
I think this is a zone config error
Quote from: Fright on February 26, 2024, 05:37:08 PM
have you tried with aggressive-nsec no :) ?
i think it should work like https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/
so if you dig up the nsec record for trendnet.com it should return something other then trendnet.com for the next record (actual name or some white lie if zone owner is afraid of zone enumeration). like
https://digwebinterface.com/?hostnames=%0D%0Acloudflare.com&type=NSEC&showcommand=on&ns=resolver&useresolver=9.9.9.9&nameservers=
(returns \000.cloudflare.com. for the next record)
but for trendnet.com it return:
https://digwebinterface.com/?hostnames=trendnet.com&type=NSEC&showcommand=on&ns=resolver&useresolver=9.9.9.9&nameservers=
(returns trendnet.com. for the next record)
actualy saying that there is no records between trendnet.com and trendnet.com
I think this is a zone config error
How would I turn aggressive nsec off? I don't see an option for it in the UI and there's no longer a custom option field for Unbound.
i think you can try
opnsense-patch 387fc59
and disable it via gui
or place aggressive-nsec no somewhere in /usr/local/opnsense/service/templates/OPNsense/Unbound/core/advanced.conf
and Apply unbound settings
Quote from: Fright on February 27, 2024, 04:05:59 PM
i think you can try
opnsense-patch 387fc59
and disable it via gui
or place aggressive-nsec no somewhere in /usr/local/opnsense/service/templates/OPNsense/Unbound/core/advanced.conf
and Apply unbound settings
I'll see if I can give that a try this weekend.
I've also encountered multiple (and strange) resolve errors with unbound like the following:
```
2024-09-18T13:46:40 Error unbound [54445:2] error: SERVFAIL <somedomain.tld. A IN>: all servers for this domain failed, at zone somedomain.tld. upstream server timeout
2024-09-18T13:43:36 Error unbound [17415:1] error: SERVFAIL <xx.xx.xx.xx.in-addr.arpa. PTR IN>: all servers for this domain failed, at zone 64.92.188.in-addr.arpa. no server to query no addresses for nameservers
2024-09-18T13:43:36 Error unbound [17415:0] error: SERVFAIL <xx.xx.xx.xx.in-addr.arpa. PTR IN>: exceeded the maximum nameserver nxdomains
2024-09-18T13:43:30 Error unbound [17415:3] error: SERVFAIL <xx.xx.xx.xx.in-addr.arpa. PTR IN>: exceeded the maximum nameserver nxdomains
2024-09-18T13:33:04 Error unbound [17415:3] error: SERVFAIL <85.21.107.40.zen.spamhaus.org. A IN>: exceeded the maximum nameserver nxdomains
2024-09-18T13:32:26 Error unbound [17415:2] error: SERVFAIL <somedomain.tld. A IN>: all servers for this domain failed, at zone somedomain.tld. from 194.0.34.53 no server to query nameserver addresses not usable
```
After reading alot of documentation, one guy said that ISPs may tamper with DNS.
In my setup, I've got 3 internet providers, so I configured Unbound to use WAN1 only, then WAN2 then WAN3.
While doing dns requests, I noticed that WAN1 provider tampered (probably) with DNS since both WAN2 and WAN3 produced good results, but WAN1 didn't.
Hopefully this might help some other people.
I've also encountered multiple (and strange) resolve errors with unbound like the following:
2024-09-18T13:46:40 Error unbound [54445:2] error: SERVFAIL <somedomain.tld. A IN>: all servers for this domain failed, at zone somedomain.tld. upstream server timeout
2024-09-18T13:43:36 Error unbound [17415:1] error: SERVFAIL <xx.xx.xx.xx.in-addr.arpa. PTR IN>: all servers for this domain failed, at zone 64.92.188.in-addr.arpa. no server to query no addresses for nameservers
2024-09-18T13:43:36 Error unbound [17415:0] error: SERVFAIL <xx.xx.xx.xx.in-addr.arpa. PTR IN>: exceeded the maximum nameserver nxdomains
2024-09-18T13:43:30 Error unbound [17415:3] error: SERVFAIL <xx.xx.xx.xx.in-addr.arpa. PTR IN>: exceeded the maximum nameserver nxdomains
2024-09-18T13:33:04 Error unbound [17415:3] error: SERVFAIL <85.21.107.40.zen.spamhaus.org. A IN>: exceeded the maximum nameserver nxdomains
2024-09-18T13:32:26 Error unbound [17415:2] error: SERVFAIL <somedomain.tld. A IN>: all servers for this domain failed, at zone somedomain.tld. from 194.0.34.53 no server to query nameserver addresses not usable
After reading alot of documentation, one guy said that ISPs may tamper with DNS.
In my setup, I've got 3 internet providers, so I configured Unbound to use WAN1 only, then WAN2 then WAN3.
While doing dns requests, I noticed that WAN1 provider tampered (probably) with DNS since both WAN2 and WAN3 produced good results, but WAN1 didn't.
Hopefully this might help some other people.
Turning off `Aggressive NSEC` under `Advanced` from GUI itself made resolution with Quad9 DoT perfect. No more resolution failures.