Unbound keep crashing

Started by waxhead, February 18, 2024, 11:35:10 AM

Previous topic - Next topic
I find that quite often (several times week/day) DNS have stopped working due to Unbound being stopped as well.

Under system->general->logfiles it seems like the reason is a segfault:
<6>pid 57337 (unbound), jid 0, uid 59: exited on signal 11

It has been like this since 23.7 and also now with 24.1. For my use on this box it is not critical , but rather an annoyance.
The only thing that differs from the default is that I have enabled nearly everything on the blocklist under the DNBL drop down menu.
In addition I have also added a URL to my own blocklist that resides on a remote server in the form of "http://example.com/blocklist.txt" - that connection has been difficult at times, but I imagine a non-existing file should NOT cause any issues with Unbound, but then again that is the only thing I can think of.

As a side note - it would be great if it was possible to configure what to do if a service crash. (restart n-times before giving up, send mail, run a script (write to rs232 for example))


Every blocklist?  How much memory did all of those take up?

I use the abuse threatfox, small oisd, adguard, and blocklist YouTube.

I've never had unbound crash on me... maybe remove some of the lists

While I don't run all of the blocklists, I do run several as well as my own custom lists.

What are you seeing in the logs?  What symptoms are you seeing?  What plugins are you using?

February 20, 2024, 01:18:39 AM #3 Last Edit: February 20, 2024, 01:22:58 AM by ksx4system
Quote from: waxhead on February 18, 2024, 11:35:10 AM
I find that quite often (several times week/day) DNS have stopped working due to Unbound being stopped as well.

Under system->general->logfiles it seems like the reason is a segfault:
<6>pid 57337 (unbound), jid 0, uid 59: exited on signal 11

It has been like this since 23.7 and also now with 24.1.

I have this issue with 24.1.1 and on 23.7.12 it worked fine, rock stable. Now Unbound sometimes (twice a day?) just stops resolving (response timeout) and it fixes itself after 5 minutes or so.

I do not have any blocklists engaged and DNS over TLS isn't used either. There's nothing whatsoever related to Unbound crashing in logs (levels set to default). There's plenty of RAM available (around 6GB out of 8GB) and CPU load is low (around 0.23).
HP ProDesk 600 G1 SFF (OPNsense latest stable)
i3 4160 / 8GB RAM / 60GB SLC SSD / Intel and Broadcom 1GbE NICs

have a nice day :)

I see this on several Business OPNsense 23.10.2 installations. Unbound stop resolving external domains (only), internal stills works. Looks like tls to quad9 is dying sometimes. A restart of unbound solve this issue immediately, if i do nothing, it fixes itself after a few minutes.

I dont know if this is a unbound or quad9 issue.

February 20, 2024, 10:53:54 PM #5 Last Edit: February 20, 2024, 11:04:11 PM by ksx4system
Quote from: Cerberus on February 20, 2024, 08:26:42 PM
I dont know if this is a unbound or quad9 issue.

It's probably not Quad9 issue. I'm using ControlD as traditional DNS provider, I didn't even bother trying DNS over TLS after very bad experience on OpenWrt (it was painfully unstable).
HP ProDesk 600 G1 SFF (OPNsense latest stable)
i3 4160 / 8GB RAM / 60GB SLC SSD / Intel and Broadcom 1GbE NICs

have a nice day :)

The only Unbound issues I've seen with Quad9 are some weirdness with DNSSEC and Quad9 returning different results per resolver.  But my Unbound isn't crashing, just not always resolving a domain.

February 22, 2024, 12:20:44 AM #7 Last Edit: February 22, 2024, 12:23:40 AM by ksx4system
Quote from: ksx4system on February 20, 2024, 01:18:39 AM
Now Unbound sometimes (twice a day?) just stops resolving (response timeout) and it fixes itself after 5 minutes or so.

This issue persists on 24.1.2 :(

Quote from: CJ on February 21, 2024, 09:07:20 PM
But my Unbound isn't crashing, just not always resolving a domain.

It appears that it doesn't die per se for me too, it just stops resolving whatsoever for few minutes.

Quote from: CJ on February 21, 2024, 09:07:20 PM
The only Unbound issues I've seen with Quad9 are some weirdness with DNSSEC and Quad9 returning different results per resolver.

Neither Cloudflare's 1.1.1.1 nor ControlD at 76.76.2.0 seem to have this issue afaik.
HP ProDesk 600 G1 SFF (OPNsense latest stable)
i3 4160 / 8GB RAM / 60GB SLC SSD / Intel and Broadcom 1GbE NICs

have a nice day :)

Quote from: ksx4system on February 22, 2024, 12:20:44 AM
Quote from: ksx4system on February 20, 2024, 01:18:39 AM
Now Unbound sometimes (twice a day?) just stops resolving (response timeout) and it fixes itself after 5 minutes or so.

This issue persists on 24.1.2 :(

Quote from: CJ on February 21, 2024, 09:07:20 PM
But my Unbound isn't crashing, just not always resolving a domain.

It appears that it doesn't die per se for me too, it just stops resolving whatsoever for few minutes.

Quote from: CJ on February 21, 2024, 09:07:20 PM
The only Unbound issues I've seen with Quad9 are some weirdness with DNSSEC and Quad9 returning different results per resolver.

Neither Cloudflare's 1.1.1.1 nor ControlD at 76.76.2.0 seem to have this issue afaik.

I was just posting as a counterpoint because I've seen people commenting that the reason unbound isn't working correctly is due to DHCP, DoT, DNSSEC, the upstream resolver, DNSBL, etc.  And I've had none of these issues dispute using all of those.

I will note that in the other thread, it seems that a lot of the people having issues with Unbound have a PC directly connected to OPNsense instead of through a switch.  The only direct connects that I have are APs and they're always on, but even so, I've not had a problem when swapping them out.

Regarding Unbound temporarily not resolving, enabling the DNS reporting and higher log levels can help with troubleshooting that.  But I would think a new thread would be in order as this one is about Unbound crashing and not just temporarily having an issue.

Quote from: CJ on February 22, 2024, 05:19:31 PM
I was just posting as a counterpoint because I've seen people commenting that the reason unbound isn't working correctly is due to DHCP, DoT, DNSSEC, the upstream resolver, DNSBL, etc.  And I've had none of these issues dispute using all of those.

I did not use DoH/DoT :) just plain old DNS.

Quote from: CJ on February 22, 2024, 05:19:31 PM
I will note that in the other thread, it seems that a lot of the people having issues with Unbound have a PC directly connected to OPNsense instead of through a switch.  The only direct connects that I have are APs and they're always on, but even so, I've not had a problem when swapping them out.

Since I only have two interfaces everything on the LAN side is behind a switch (or two).

Quote from: CJ on February 22, 2024, 05:19:31 PM
Regarding Unbound temporarily not resolving, enabling the DNS reporting and higher log levels can help with troubleshooting that.  But I would think a new thread would be in order as this one is about Unbound crashing and not just temporarily having an issue.

I agree, a new thread would be needed.
HP ProDesk 600 G1 SFF (OPNsense latest stable)
i3 4160 / 8GB RAM / 60GB SLC SSD / Intel and Broadcom 1GbE NICs

have a nice day :)

Just following up on this.

I still get this error in the log files:
<6>pid 11135 (unbound), jid 0, uid 59: exited on signal 11

and I could find another post about what appears to be the same.
https://forum.opnsense.org/index.php?topic=20516.0

The way I notice this problem is that DNS simply does not work , and when I log in to my OPNsense box I see on the dashboard that the unbound service is "red" e.g. I need to click the "play" button to get it started again.

This is a very annoying issue, and since every crash in theory is a security issue as well I would love to know what to do to help diganose this better.

May 05, 2024, 11:24:04 PM #11 Last Edit: May 05, 2024, 11:47:20 PM by Apex
I'm running unbound on 24.1.6 and I can always tell something is wrong with the firewall is because it starts dropping connections.

Unbound is consuming 4 cores and 4 threads at 100% CPU choking it until the service crashes.

I haven't increased the log level, I've read other people experiencing issues with unbound and migrated over to DNSMasq and haven't had an issue since.

I prefer unbound doing recursion, its faster and much more versatile than DNSMasq, but I can't have my firewall sporadically dropping traffic because of Unbound either.

During the issue at the standard log level, this is what I see before the service just stops, and its just pages and pages of this:

2024-05-03T09:55:07-04:00   Informational   unbound   [46703:0] info: start of service (unbound 1.19.3).   
2024-05-03T09:55:07-04:00   Notice   unbound   [46703:0] notice: init module 2: iterator   
2024-05-03T09:55:07-04:00   Notice   unbound   [46703:0] notice: init module 1: validator   
2024-05-03T09:55:07-04:00   Notice   unbound   daemonize unbound dhcpd watcher.   
2024-05-03T09:55:07-04:00   Notice   unbound   [46703:0] notice: init module 0: python   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting   
2024-05-03T09:55:06-04:00   Informational   unbound   [87043:0] info: service stopped (unbound 1.19.3).

Under System---> Logs---> General, I see this for Unbound:

2024-05-03T17:44:37-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:44:37-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:44:34-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:44:34-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:44:30-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:44:30-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:44:24-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:44:24-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:44:20-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:44:20-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:44:17-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:44:17-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:43:58-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:43:58-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:43:55-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:43:55-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:43:52-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:43:52-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())   
2024-05-03T17:43:45-04:00   Error   opnsense   /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'   
2024-05-03T17:43:45-04:00   Notice   opnsense   /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())

Quote from: waxhead on February 18, 2024, 11:35:10 AM
I find that quite often (several times week/day) DNS have stopped working due to Unbound being stopped as well.

Under system->general->logfiles it seems like the reason is a segfault:
<6>pid 57337 (unbound), jid 0, uid 59: exited on signal 11

It has been like this since 23.7 and also now with 24.1. For my use on this box it is not critical , but rather an annoyance.
The only thing that differs from the default is that I have enabled nearly everything on the blocklist under the DNBL drop down menu.
In addition I have also added a URL to my own blocklist that resides on a remote server in the form of "http://example.com/blocklist.txt" - that connection has been difficult at times, but I imagine a non-existing file should NOT cause any issues with Unbound, but then again that is the only thing I can think of.

As a side note - it would be great if it was possible to configure what to do if a service crash. (restart n-times before giving up, send mail, run a script (write to rs232 for example))

I have also this issue with unbound. I have set a daily restart of opnsense and its little better now, but now and then unbound still crashes!

it cant go many hour before it crashes....

opnsense used to be stable, this is really bad!


2024-08-29T15:56:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T15:56:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T15:56:08   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T15:55:13   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T15:55:00   Notice   send_telemetry.py   telemetry data collected 2 records in 0.03 seconds @2024-08-29 13:54:55.508004   
2024-08-29T15:54:52   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T15:52:13   Error   opnsense   /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '89172''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 89172: No such process'   
2024-08-29T15:51:07   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T15:47:30   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T15:46:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T15:46:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T15:40:27   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T15:38:33   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T15:38:12   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T15:36:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T15:36:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T15:33:28   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T15:26:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T15:26:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T15:26:57   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T15:21:53   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T15:21:32   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T15:18:18   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T15:16:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T15:16:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T15:08:45   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T15:06:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T15:06:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T15:05:13   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T15:04:52   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T15:01:05   Notice   syslog-ng   Configuration reload finished;   
2024-08-29T15:01:05   Notice   syslog-ng   Configuration reload request received, reloading configuration;   
2024-08-29T14:59:40   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T14:56:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T14:56:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T14:54:58   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T14:48:33   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T14:48:12   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T14:46:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T14:46:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T14:45:24   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T14:38:24   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T14:36:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T14:36:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing   
2024-08-29T14:33:45   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T14:31:53   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T14:31:32   Notice   dhcp6c   dhcp6c_script: RENEW on igc0 executing   
2024-08-29T14:27:39   Warning   radvd   prefix length should be 64 for igc1   
2024-08-29T14:26:58   Notice   dhclient   dhclient-script: Creating resolv.conf   
2024-08-29T14:26:58   Notice   dhclient   dhclient-script: Reason RENEW on igc0 executing