Unbound crashing

Started by seed, August 22, 2023, 08:18:48 AM

Previous topic - Next topic
August 25, 2023, 08:42:46 PM #15 Last Edit: August 27, 2023, 04:09:45 PM by karlson2k
In my case unbound process hungs again after ~27 hours of normal running (with 7406a5067f8 patch). It was using 100% CPU (as reported by 'top' command). Only 'kill -9 PID' helped. GUI cannot restart it.
The last record in unbound log file is
[73750:0] info: service stopped (unbound 1.17.1).
The related record in the system log are
New IP Address (for WAN interface)
and
/usr/local/etc/rc.newwanip ...

After 'kill -9 unboundpid' I've got a lot of log entries like:
/usr/local/etc/rc.newwanip: The command '/sbin/mount -r -t nullfs '/usr/local/lib/python3.9' '/var/unbound/usr/local/lib/python3.9'' returned exit code '1', the output was 'mount_nullfs: /var/unbound/usr/local/lib/python3.9: Resource deadlock avoided'

Two suggestions:
* kill services by 'kill -kill' after timeout if they cannot be stopped/restarted as normal.
* stop running 'rc.newwanip' if nothing has changed. My ISP has IP renewal every 10 minutes, but every time the same IP is assigned (and the same mask, I don't use my ISP DNS servers). It makes no sense to run a lot of processes related to IP update if nothing has changed.

Note: I had so-reuseport: no in unbound config to use multi-thread and I have RSS enabled.

Additional note: my router hardware has 8 vCPU cores.

Unbound hangs again. Without so-reuseport: no it has been running longer (with single thread), but hangs up anyway.


Quote from: franco on August 24, 2023, 09:06:20 AM
Thanks for the report. Can you try this patch? https://github.com/opnsense/core/commit/7406a5067f8

# opnsense-patch 7406a5067f8


Cheers,
Franco

in my case the problem did not happen again.
i want all services to run with wirespeed and therefore run this dedicated hardware configuration:

AMD Ryzen 7 9700x
ASUS Pro B650M-CT-CSM
64GB DDR5 ECC (2x KSM56E46BD8KM-32HA)
Intel XL710-BM1
Intel i350-T4
2x SSD with ZFS mirror
PiKVM for remote maintenance

private user, no business use

Sounds like more upstream bugs. Is this all on DoT? It's still rather buggy after all these years.


Cheers,
Franco

My unbound is forwarding all requests to local DNSCryptProxy. LAN clients are using simple DNS protocol (mostly UDP).
DNSSEC is enabled.

Hmm, ok, that's a basic setup then.

What is dnscrypt-proxy doing that Unbound cannot? Or is this a separate dnscrypt-proxy instance and not the plugin?

Do you need Unbound to forward? Maybe you can use Dnsmasq to do that job instead or use dnscrypt-proxy directly... it works fine nowadays as core DNS server, see docs:

https://docs.opnsense.org/manual/how-tos/dnscrypt-proxy.html#example-standalone-dns


Cheers,
Franco

My setup is based on the standard repo. DnsCrypt-proxy is used as a plugin.

I need flexibility of Unbound:
* good integration with DHCP (I'm not aware whether DnsCrypt-proxy is integrated)
* DNS leak control by specific zones
* Some DNS names overrides
* Forward requests for ISP-specific domains directly for ISP's DNS servers
* Integration with OpenNIC for some top-level domans (via manual config)
* Some other features

Unbound alone is not enough as it works without encrypted channels, allow my ISP (and other structures) easily intercept the traffic and modify remote responses.

Dnsmasq lacks some of the required features.

I may install GDB to find where Unbound is getting stuck. Is there any way to download debug symbols for Unbound?

August 30, 2023, 02:27:31 PM #25 Last Edit: August 30, 2023, 03:39:58 PM by karlson2k
Probably, I've found the reason why the bug is triggered on my router.
As I wrote previously, my ISP set IP renewal interval to 10 minutes. While nothing is changed, a lot of processes run every 10 minutes, including Unbound restart.
So to reproduce the issue, use WAN IP with short renewal internal, like every minute. Or probably a lot of manual restarts may trigger the same issue.

Today I've got a new type of Unbound problem:
2023-08-30T15:07:09 Critical unbound [49665:2] fatal error: Could not initialize thread
2023-08-30T15:07:09 Informational unbound [49665:2] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0
2023-08-30T15:07:09 Informational unbound [49665:2] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2023-08-30T15:07:09 Error unbound [49665:2] error: Could not set root or stub hints
2023-08-30T15:07:09 Error unbound [49665:2] error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type
2023-08-30T15:07:09 Notice unbound [49665:0] notice: init module 2: iterator
2023-08-30T15:07:09 Notice unbound [49665:0] notice: init module 1: validator
2023-08-30T15:07:09 Notice unbound daemonize unbound dhcpd watcher.
2023-08-30T15:07:09 Notice unbound [49665:0] notice: init module 0: python


I have default root.hints file and it is not changed from run to run. :)

For me it looks like memory corruption problem or some broken ABI interface.

This is likely due to an interface selection in the Unbound settings. Using the recommended empty selection will not force a restart on every DHCP renew.


Cheers,
Franco

August 30, 2023, 03:34:58 PM #27 Last Edit: August 30, 2023, 03:39:05 PM by karlson2k
I'll try. However, I don't want Unbound to reply on WAN interfaces. Yes, the access lists are configured, but it would be safer to not bind to (reply on) external interfaces

Probably it worth to avoid resetting everything if IP stays the same? The script sees the old IP and the new IP.
Or at least add interface option "Do not enforce re-binding of daemons if IP hasn't changed with DHCP lease update" or something like this.

August 30, 2023, 06:30:50 PM #29 Last Edit: August 30, 2023, 06:33:45 PM by karlson2k
I changed interfaces selection to "All" (empty), but Unbound still is restarting with each DHCP license update on WAN ports. Probably because OpenVPN reconfiguration is triggered by

/usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : unbound_configure_do(,opt1))
/usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN2.
/usr/local/etc/rc.newwanip: ROUTING: entering configure using 'opt1'

each time when IP is renewed.