Unbound, DNSSEC, and Resolution Weirdness

Started by CJ, February 17, 2024, 03:58:11 PM

Previous topic - Next topic
Ran into an interesting situation yesterday.  Was researching switches and when I attempted to go to www.trendnet.com it didn't work.  Unbound was returning NXDOMAIN.  trendnet.com resolved, but since it redirects to www.trendnet.com I still couldn't get to the site.

After a bit of troubleshooting, here's what appears to have happened.  One of the IPv4 resolvers of Quad9 wasn't resolving the domain.  The other IPv4 resolver and both IPv6 resolvers both worked correctly.  Looking through the Unbound logs, the only reference I can find is where the problem resolver shows "nodata proof failed" for the domain.

I have Quad9 set up via DOT and DNSSEC support turned on.  While digging through Quad9 site, they mention that they don't recommend enabling DNSSEC as it can cause false BOGUS responses.  https://docs.quad9.net/Quad9_For_Organizations/DNS_Forwarder_Best_Practices/#disable-dnssec-validation  Turning off DNSSEC did allow Unbound to start returning an IP for the domain.

Today the oddness continues.  Now the other IPv4 resolver isn't returning a result for the domain.  But only on 53.  DoT returns a valid IP.  Delv shows that the NXDOMAIN is a valid result.  I'm not sure how to get it to validate DoT.

; <<>> DiG 9.18.18-0ubuntu2.1-Ubuntu <<>> @9.9.9.9 www.trendnet.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 11260
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.trendnet.com. IN A

;; AUTHORITY SECTION:
trendnet.com. 2257 IN SOA NS65.WORLDNIC.com. namehost.WORLDNIC.com. 123110920 10800 3600 604800 3600

;; Query time: 15 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Sat Feb 17 09:41:11 EST 2024
;; MSG SIZE  rcvd: 104


;; resolution failed: ncache nxdomain
; negative response, fully validated
; www.trendnet.com. 2426 IN \-ANY ;-$NXDOMAIN
; trendnet.com. SOA NS65.WORLDNIC.com. namehost.WORLDNIC.com. 123110920 10800 3600 604800 3600
; trendnet.com. RRSIG SOA ...
; trendnet.com. RRSIG NSEC ...
; trendnet.com. NSEC trendnet.com. A NS SOA MX TXT RRSIG NSEC DNSKEY CAA


; <<>> DiG 9.18.18-0ubuntu2.1-Ubuntu <<>> @9.9.9.9 www.trendnet.com +tls
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64722
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.trendnet.com. IN A

;; ANSWER SECTION:
www.trendnet.com. 6704 IN A 38.122.20.251

;; Query time: 35 msec
;; SERVER: 9.9.9.9#853(9.9.9.9) (TLS)
;; WHEN: Sat Feb 17 09:42:22 EST 2024
;; MSG SIZE  rcvd: 61


I'm not sure what mechanism Unbound uses to choose from the results provided by upstream resolvers, especially when DNSSEC is enabled.  I have the validation logging set to 2 but the general log at the default 1 and there's not anything indicating what the issue is.  I've increased the logging to the maximum to see if I can determine the cause.

Has anyone else encountered something similar or do you have any additionally troubleshooting suggestions?  Unfortunately, it's a bit of a moving target as I don't know why or when Quad9 is going to return a valid result or not.

Hi
I have experienced just about the exact same symptoms today - also with quad9.
I'll look into it tomorrow and get back to you about my results...

Quote from: holunde on February 19, 2024, 09:33:35 PM
Hi
I have experienced just about the exact same symptoms today - also with quad9.
I'll look into it tomorrow and get back to you about my results...

I've already reached out to Quad9 and they're investigating the issue.  My concern here is trying to understand what Unbound is doing and why it would be picking NXDOMAIN instead of a valid result when everything appears to have a proper DNSSEC response, yet picking the valid IP when DNSSEC is disabled.

I could be totally out of target but maybe will give something to at least discard.
I remember vaguely that with quad9 there is a field they return for nxdomain responses to discern between non-existing domain and one that is blocked by their filters, or maybe was for another reason. I think it was authority 1 and 0.
So, in both cases an nxdomain is returned to Unbound.


I think so Fright, at least it would allow to more easily diagnose.

Quote from: cookiemonster on February 20, 2024, 03:19:58 PM
I could be totally out of target but maybe will give something to at least discard.
I remember vaguely that with quad9 there is a field they return for nxdomain responses to discern between non-existing domain and one that is blocked by their filters, or maybe was for another reason. I think it was authority 1 and 0.
So, in both cases an nxdomain is returned to Unbound.

Interesting.  I'll have to take a look.  Any recommendations for the best way to test for that?

Quote from: Fright on February 20, 2024, 05:56:17 PM
Perhaps an additional checkbox to disable aggressive-nsec (enabled by default) would be useful..
https://unbound.docs.nlnetlabs.nl/en/latest/topics/privacy/aggressive-nsec.html
https://github.com/NLnetLabs/unbound/issues/824

That's interesting, but I don't think that it's due to aggressive-nsec as trendnet.com was correctly resolving.  It's the subdomain that doesn't.

From what I can tell, Unbound still makes the resolver call but caches the NXDOMAIN result instead of the valid IP.

Quote from: CJ on February 21, 2024, 09:13:37 PM
Quote from: cookiemonster on February 20, 2024, 03:19:58 PM
I could be totally out of target but maybe will give something to at least discard.
I remember vaguely that with quad9 there is a field they return for nxdomain responses to discern between non-existing domain and one that is blocked by their filters, or maybe was for another reason. I think it was authority 1 and 0.
So, in both cases an nxdomain is returned to Unbound.

Interesting.  I'll have to take a look.  Any recommendations for the best way to test for that?

yes, with dig. Found a link https://docs.quad9.net/FAQs/

Quote from: cookiemonster on February 21, 2024, 11:23:14 PM
yes, with dig. Found a link https://docs.quad9.net/FAQs/

I'm assuming you're referring to this section. https://docs.quad9.net/FAQs/#identifying-a-quad9-block

I'll give that a try when I have some time and also see if I still have my notes from the original situation.

I should have mentioned that, but yes, that's it.

I just checked and it's not a blocking problem.


dig @9.9.9.9 A www.trendnet.com +dnssec | grep status
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11539

dig @149.112.112.112 A www.trendnet.com +dnssec | grep status
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24981

dig @9.9.9.9 www.trendnet.com | grep "status\|AUTHORITY:"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28429
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

dig @149.112.112.112 www.trendnet.com | grep "status\|AUTHORITY:"
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 38281
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1


Both resolvers return valid DNSSEC but one gives NXDOMAIN and the other actually resolves.  And the NXDOMAIN isn't a block NXDOMAIN as that would not have the ra or Authority 1.

Quad9 asked me to run some commands in their original response but haven't replied since.  I'm going to try poking them again to see what the current status is.  But as I mentioned earlier, I'm curious why Unbound is picking the NXDOMAIN instead of the valid result as they both have valid DNSSEC.  Simply unchecking DNSSEC causes it to return the IP instead of the NXDOMAIN.

It doesn't seem to be a timing issue as it's repeatable.  Something in Unbound just seems to prefer NXDOMAIN when DNSSEC is enabled and there's a conflict.

@CJ
Quotehat's interesting, but I don't think that it's due to aggressive-nsec as trendnet.com was correctly resolving.
yep,  trendnet.com was correctly resolving but it also returns the NSEC record that says that the next record is trendnet.com. so it actualy completing the chain.
this is an error in the trendnet.com zone config that provokes NXDOMAIN for subdomains if aggressive-nsec is on

Quote from: Fright on February 24, 2024, 06:59:56 PM
@CJ
Quotehat's interesting, but I don't think that it's due to aggressive-nsec as trendnet.com was correctly resolving.
yep,  trendnet.com was correctly resolving but it also returns the NSEC record that says that the next record is trendnet.com. so it actualy completing the chain.
this is an error in the trendnet.com zone config that provokes NXDOMAIN for subdomains if aggressive-nsec is on

Can you elaborate some?  I'm not completely following you.

have you tried with aggressive-nsec no  :) ?
i think it should work like https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/
so if you dig up the nsec record for trendnet.com it should return something other then trendnet.com for the next record (actual name or some white lie if zone owner is afraid of zone enumeration). like
https://digwebinterface.com/?hostnames=%0D%0Acloudflare.com&type=NSEC&showcommand=on&ns=resolver&useresolver=9.9.9.9&nameservers=
(returns \000.cloudflare.com. for the next record)

but for trendnet.com it return:
https://digwebinterface.com/?hostnames=trendnet.com&type=NSEC&showcommand=on&ns=resolver&useresolver=9.9.9.9&nameservers=
(returns trendnet.com. for the next record)
actualy saying that there is no records between trendnet.com and trendnet.com
I think this is a zone config error

Quote from: Fright on February 26, 2024, 05:37:08 PM
have you tried with aggressive-nsec no  :) ?
i think it should work like https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/
so if you dig up the nsec record for trendnet.com it should return something other then trendnet.com for the next record (actual name or some white lie if zone owner is afraid of zone enumeration). like
https://digwebinterface.com/?hostnames=%0D%0Acloudflare.com&type=NSEC&showcommand=on&ns=resolver&useresolver=9.9.9.9&nameservers=
(returns \000.cloudflare.com. for the next record)

but for trendnet.com it return:
https://digwebinterface.com/?hostnames=trendnet.com&type=NSEC&showcommand=on&ns=resolver&useresolver=9.9.9.9&nameservers=
(returns trendnet.com. for the next record)
actualy saying that there is no records between trendnet.com and trendnet.com
I think this is a zone config error

How would I turn aggressive nsec off?  I don't see an option for it in the UI and there's no longer a custom option field for Unbound.