unbound and fallback resolvers

Started by planetf1, January 31, 2025, 12:37:09 PM

Previous topic - Next topic
I'm using opnsense 24.10 (rc2) with unbound.

I use quad9 as my resolver in unbound, via TLS.

All these components are generally reliable. However from time to time quad9 has been known to have an outage -- for example it did today in parts of the UK. Most queries timed out for many in the north.

I do configure unbound to allow serving results after ttl expires. Mostly to handle a few chinese sites where on occasion the upstream resolution can take >5s.

In the past I used a 'ctrld' daemon from ControlD which had a nice feature - as well as defining multiple resolvers, you could configure error handling, for example what to do after a timeout. So you could imagine having an initial 2.5s timeout, then a fallback to another resolver etc.

unbound doesn't seem to have this - I can only specify multiple resolvers. However rather than some random/round robin I want a more predicable work distribution. ie always hit quad9 first, and only hit an alternate after a short timeout. Monitor, and if there are multiple timeouts, flip over to a backup. Then check periodically to see if situation has returned to normal before switching default back. Alert admin of changes.

Why? Well, since quad9 does malicious filtering & seems to be the most accurate - few false positives, yet up to date with current threats. Also benefits from being open & non-profit.

Any thoughts? I can imagine any of the following
 - Give up on malicious filtering, and just load up unbound with more TLS resolvers
 - as above but use recursive resolve (more data exposure, slower)
 - move from unbound to ctrld (flexible, but I think it's a bit flaky)
 - multiple tls reoslvers, and implement local filtering (unbound, or pihole)
 - no nothing as quad9 outage is rare
 - Implement some external monitoring. Flip configuration when outage detected, flip back later

Any thoughts on good approaches here

I see this has been asked a few times.

It seems as if specifying this in /var/unbound/etc/dot.conf

forward-first: yes
would help

My current file is

# Forward zones

# Forward zones over TLS
server:
  tls-cert-bundle: /usr/local/etc/ssl/cert.pem

forward-zone:
  name: "."
  forward-tls-upstream: yes
  forward-addr: 2620:fe::9@853#dns.quad9.net
  forward-addr: 149.112.112.112@853#dns.quad9.net
  forward-addr: 9.9.9.9@853#dns.quad9.net
  forward-addr: 2620:fe::fe@853#dns.quad9.net

This at least would fallback to recursive resolution if required (albeit siliently)

I wonder also if I have too many forwarders here - would unbound just try 1 or all. I suspect all might fail at once.

Still, this could be a useful opnsense enhancement. I previously did a PR to add another parm for unbound (now merged), so may see if I can suggest a change for this?