Unbound crashing

Started by seed, August 22, 2023, 08:18:48 AM

Previous topic - Next topic
Then traffic would be easier to intercept or block. Also Cloudflare would get the full list of my DNS requests.
As local server is used, it must be complaint with local regulation, including full reports, "legal" interception and censorship.
Not nice, not a solution for me.

However, I may test it.

In any case, the broken part is Unbound, not DnsCrypt-proxy.

Here is the promised patch:

https://github.com/opnsense/core/commit/a086f40b

# opnsense-patch a086f40b


Cheers,
Franco

Quote from: franco on September 14, 2023, 02:07:55 PM
Here is the promised patch:

https://github.com/opnsense/core/commit/a086f40b

Applied.
I will test for a while without so-reuseport: no and then will try with real multi-threading.

September 14, 2023, 03:41:54 PM #48 Last Edit: September 14, 2023, 03:54:27 PM by karlson2k
Quote from: karlson2k on September 14, 2023, 02:15:54 PM
Applied.
It's beautiful to see Unbound requests finally served from cache. Previously just 5-20% were served by cache and the rest is recursive.
Now is 80-90% of requests are served from cache.

Thanks for the fix, franco!

Load average became almost twice lower.

I'll test with so-reuseport: no now.

When tried to restart Unbound, I got
Error unbound [84125:2] error: Could not set root or stub hints
Error unbound [84125:2] error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type
Critical unbound [84125:2] fatal error: Could not initialize thread

The kind of error I've seen before.

Unbound process were using 100% CPU.
GUI cannot update Unbound status.

I had to kill the process by kill -9 84125, then it is restarted.

I saw this kind of errors before.
Looks like Unbound freeze at the start because of /root.hints parsing error.
How it's possible?
Does Unbound specify the full path in the log and it tries to parse the file located in the root directory?

It indicates a general restart issue. Did you ever do a heath audit?


Cheers,
Franco

I was using default Unbound from standard OPNsense installation.
Then I installed version 1.18.0 from OPNsernse repo, as was suggested here.

Now I've changed it back to default one (the same 1.18.0 version).
***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 23.7.4 at Thu Sep 14 17:52:17 2023
>>> Check installed kernel version
Version 23.7.4 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 23.7.4 is correct.
>>> Check for missing or altered base files
No problems detected.
>>> Check installed repositories
OPNsense
>>> Check installed plugins
os-dnscrypt-proxy 1.14_1
os-smart 2.2_2
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: ....
opnsense-23.7.4: checksum mismatch for /usr/local/etc/inc/plugins.inc.d/unbound.inc
Checking all packages......... done
>>> Check for core packages consistency
Core package "opnsense" has 68 dependencies to check.
Checking packages: ..................................................................... done
***DONE***


File unbound.inc was patched by suggested patch.

Quote from: newsense on September 14, 2023, 12:51:29 PM
DNScrypt on the other hand - if using thee stock one - might not be the best tool here, it's quite old and in need of an update (maybe should be removed from the plugin list ?)
The FreeBSD repo has the latest version: https://www.freshports.org/dns/dnscrypt-proxy2

Let me know if help for plugin update is needed, I'm ready to work on it.

Quote from: franco on September 14, 2023, 02:07:55 PM
Here is the promised patch:

https://github.com/opnsense/core/commit/a086f40b

# opnsense-patch a086f40b

Several days with this patch and no issues even with so-reuseport: no.
The Unbound cache is finally filled with useful data and caching mechanisms are providing benefits.
The upstram DHCP lease is still 10 minutes long, but now it doesn't cause Unbound reload.
Thanks!

Note: the restart issue is still here, could be triggered by manual restart.

Quote from: karlson2k on September 14, 2023, 01:59:56 PM
Then traffic would be easier to intercept or block. Also Cloudflare would get the full list of my DNS requests.
As local server is used, it must be complaint with local regulation, including full reports, "legal" interception and censorship.
Not nice, not a solution for me.

However, I may test it.

In any case, the broken part is Unbound, not DnsCrypt-proxy.

How do you have dnscrypt configured?  Are you using it to do recursive root resolution?  Just trying to understand the benefits over using DoT (not necessarily cloudflare).

The last update to 23.7.5 reverted "no-restart" patch. Unbound starts hanging again.
I had to re-apply the patch. Any chance that the patch will be backported to 23.7?

Quote from: CJ on September 20, 2023, 02:09:58 PM
Quote from: karlson2k on September 14, 2023, 01:59:56 PM
Then traffic would be easier to intercept or block. Also Cloudflare would get the full list of my DNS requests.
As local server is used, it must be complaint with local regulation, including full reports, "legal" interception and censorship.
Not nice, not a solution for me.

How do you have dnscrypt configured?  Are you using it to do recursive root resolution?  Just trying to understand the benefits over using DoT (not necessarily cloudflare).
Is it mostly default configuration. DNSCrypt-Proxy downloads the list of public servers available via DNSCrypt or DNS-over-HTTPS then automatically detect fastest servers and send the request to random subset of the short list of the fastest servers.

No server gets the complete list of all your DNS queries.

I seem to be having a similar issue, across both 23.7.4 and 23.7.5...

2023-09-28T18:20:01Criticalunbound[14883:3] fatal error: Could not initialize thread

2023-09-28T18:20:01Errorunbound[14883:3] error: Could not set root or stub hints

2023-09-28T18:20:01Errorunbound[14883:3] error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type


I am not sure what I should be doing through reading this whole thread to fix it?

I had monit looking at restarting unbound for me, but i have since turned that off in attempting to see what is causing the issue

I checked https://www.internic.net/domain/named.root and it's the same file (except the dates) as

/usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints

that we use to bootstrap the root servers.

# md5 /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints
MD5 (/usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints) = ac281ab5712d761d1a4e7a7224b89666

Should be the same as

# md5 /var/unbound/root.hints
MD5 (/var/unbound/root.hints) = ac281ab5712d761d1a4e7a7224b89666

If not it would be helpful to diff:

# diff -u /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints


Cheers,
Franco

Same issue here  (on 23.7.5) - amongst others.