I find that quite often (several times week/day) DNS have stopped working due to Unbound being stopped as well.
Under system->general->logfiles it seems like the reason is a segfault:
<6>pid 57337 (unbound), jid 0, uid 59: exited on signal 11
It has been like this since 23.7 and also now with 24.1. For my use on this box it is not critical , but rather an annoyance.
The only thing that differs from the default is that I have enabled nearly everything on the blocklist under the DNBL drop down menu.
In addition I have also added a URL to my own blocklist that resides on a remote server in the form of "http://example.com/blocklist.txt" - that connection has been difficult at times, but I imagine a non-existing file should NOT cause any issues with Unbound, but then again that is the only thing I can think of.
As a side note - it would be great if it was possible to configure what to do if a service crash. (restart n-times before giving up, send mail, run a script (write to rs232 for example))
Every blocklist? How much memory did all of those take up?
I use the abuse threatfox, small oisd, adguard, and blocklist YouTube.
I've never had unbound crash on me... maybe remove some of the lists
While I don't run all of the blocklists, I do run several as well as my own custom lists.
What are you seeing in the logs? What symptoms are you seeing? What plugins are you using?
Quote from: waxhead on February 18, 2024, 11:35:10 AM
I find that quite often (several times week/day) DNS have stopped working due to Unbound being stopped as well.
Under system->general->logfiles it seems like the reason is a segfault:
<6>pid 57337 (unbound), jid 0, uid 59: exited on signal 11
It has been like this since 23.7 and also now with 24.1.
I have this issue with 24.1.1 and on 23.7.12 it worked fine, rock stable. Now Unbound sometimes (twice a day?) just stops resolving (response timeout) and it fixes itself after 5 minutes or so.
I do not have any blocklists engaged and DNS over TLS isn't used either. There's nothing whatsoever related to Unbound crashing in logs (levels set to default). There's plenty of RAM available (around 6GB out of 8GB) and CPU load is low (around 0.23).
I see this on several Business OPNsense 23.10.2 installations. Unbound stop resolving external domains (only), internal stills works. Looks like tls to quad9 is dying sometimes. A restart of unbound solve this issue immediately, if i do nothing, it fixes itself after a few minutes.
I dont know if this is a unbound or quad9 issue.
Quote from: Cerberus on February 20, 2024, 08:26:42 PM
I dont know if this is a unbound or quad9 issue.
It's probably not Quad9 issue. I'm using ControlD as traditional DNS provider, I didn't even bother trying DNS over TLS after very bad experience on OpenWrt (it was painfully unstable).
The only Unbound issues I've seen with Quad9 are some weirdness with DNSSEC and Quad9 returning different results per resolver. But my Unbound isn't crashing, just not always resolving a domain.
Quote from: ksx4system on February 20, 2024, 01:18:39 AM
Now Unbound sometimes (twice a day?) just stops resolving (response timeout) and it fixes itself after 5 minutes or so.
This issue persists on 24.1.2 :(
Quote from: CJ on February 21, 2024, 09:07:20 PM
But my Unbound isn't crashing, just not always resolving a domain.
It appears that it doesn't die per se for me too, it just stops resolving whatsoever for few minutes.
Quote from: CJ on February 21, 2024, 09:07:20 PM
The only Unbound issues I've seen with Quad9 are some weirdness with DNSSEC and Quad9 returning different results per resolver.
Neither Cloudflare's 1.1.1.1 nor ControlD at 76.76.2.0 seem to have this issue afaik.
Quote from: ksx4system on February 22, 2024, 12:20:44 AM
Quote from: ksx4system on February 20, 2024, 01:18:39 AM
Now Unbound sometimes (twice a day?) just stops resolving (response timeout) and it fixes itself after 5 minutes or so.
This issue persists on 24.1.2 :(
Quote from: CJ on February 21, 2024, 09:07:20 PM
But my Unbound isn't crashing, just not always resolving a domain.
It appears that it doesn't die per se for me too, it just stops resolving whatsoever for few minutes.
Quote from: CJ on February 21, 2024, 09:07:20 PM
The only Unbound issues I've seen with Quad9 are some weirdness with DNSSEC and Quad9 returning different results per resolver.
Neither Cloudflare's 1.1.1.1 nor ControlD at 76.76.2.0 seem to have this issue afaik.
I was just posting as a counterpoint because I've seen people commenting that the reason unbound isn't working correctly is due to DHCP, DoT, DNSSEC, the upstream resolver, DNSBL, etc. And I've had none of these issues dispute using all of those.
I will note that in the other thread, it seems that a lot of the people having issues with Unbound have a PC directly connected to OPNsense instead of through a switch. The only direct connects that I have are APs and they're always on, but even so, I've not had a problem when swapping them out.
Regarding Unbound temporarily not resolving, enabling the DNS reporting and higher log levels can help with troubleshooting that. But I would think a new thread would be in order as this one is about Unbound crashing and not just temporarily having an issue.
Quote from: CJ on February 22, 2024, 05:19:31 PM
I was just posting as a counterpoint because I've seen people commenting that the reason unbound isn't working correctly is due to DHCP, DoT, DNSSEC, the upstream resolver, DNSBL, etc. And I've had none of these issues dispute using all of those.
I did not use DoH/DoT :) just plain old DNS.
Quote from: CJ on February 22, 2024, 05:19:31 PM
I will note that in the other thread, it seems that a lot of the people having issues with Unbound have a PC directly connected to OPNsense instead of through a switch. The only direct connects that I have are APs and they're always on, but even so, I've not had a problem when swapping them out.
Since I only have two interfaces everything on the LAN side is behind a switch (or two).
Quote from: CJ on February 22, 2024, 05:19:31 PM
Regarding Unbound temporarily not resolving, enabling the DNS reporting and higher log levels can help with troubleshooting that. But I would think a new thread would be in order as this one is about Unbound crashing and not just temporarily having an issue.
I agree, a new thread would be needed.
Just following up on this.
I still get this error in the log files:
<6>pid 11135 (unbound), jid 0, uid 59: exited on signal 11
and I could find another post about what appears to be the same.
https://forum.opnsense.org/index.php?topic=20516.0
The way I notice this problem is that DNS simply does not work , and when I log in to my OPNsense box I see on the dashboard that the unbound service is "red" e.g. I need to click the "play" button to get it started again.
This is a very annoying issue, and since every crash in theory is a security issue as well I would love to know what to do to help diganose this better.
I'm running unbound on 24.1.6 and I can always tell something is wrong with the firewall is because it starts dropping connections.
Unbound is consuming 4 cores and 4 threads at 100% CPU choking it until the service crashes.
I haven't increased the log level, I've read other people experiencing issues with unbound and migrated over to DNSMasq and haven't had an issue since.
I prefer unbound doing recursion, its faster and much more versatile than DNSMasq, but I can't have my firewall sporadically dropping traffic because of Unbound either.
During the issue at the standard log level, this is what I see before the service just stops, and its just pages and pages of this:
2024-05-03T09:55:07-04:00 Informational unbound [46703:0] info: start of service (unbound 1.19.3).
2024-05-03T09:55:07-04:00 Notice unbound [46703:0] notice: init module 2: iterator
2024-05-03T09:55:07-04:00 Notice unbound [46703:0] notice: init module 1: validator
2024-05-03T09:55:07-04:00 Notice unbound daemonize unbound dhcpd watcher.
2024-05-03T09:55:07-04:00 Notice unbound [46703:0] notice: init module 0: python
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2024-05-03T09:55:06-04:00 Informational unbound [87043:0] info: service stopped (unbound 1.19.3).
Under System---> Logs---> General, I see this for Unbound:
2024-05-03T17:44:37-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:44:37-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:44:34-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:44:34-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:44:30-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:44:30-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:44:24-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:44:24-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:44:20-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:44:20-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:44:17-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:44:17-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:43:58-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:43:58-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:43:55-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:43:55-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:43:52-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:43:52-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-05-03T17:43:45-04:00 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '49860''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 49860: No such process'
2024-05-03T17:43:45-04:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
Quote from: waxhead on February 18, 2024, 11:35:10 AM
I find that quite often (several times week/day) DNS have stopped working due to Unbound being stopped as well.
Under system->general->logfiles it seems like the reason is a segfault:
<6>pid 57337 (unbound), jid 0, uid 59: exited on signal 11
It has been like this since 23.7 and also now with 24.1. For my use on this box it is not critical , but rather an annoyance.
The only thing that differs from the default is that I have enabled nearly everything on the blocklist under the DNBL drop down menu.
In addition I have also added a URL to my own blocklist that resides on a remote server in the form of "http://example.com/blocklist.txt" - that connection has been difficult at times, but I imagine a non-existing file should NOT cause any issues with Unbound, but then again that is the only thing I can think of.
As a side note - it would be great if it was possible to configure what to do if a service crash. (restart n-times before giving up, send mail, run a script (write to rs232 for example))
I have also this issue with unbound. I have set a daily restart of opnsense and its little better now, but now and then unbound still crashes!
it cant go many hour before it crashes....
opnsense used to be stable, this is really bad!
2024-08-29T15:56:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T15:56:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T15:56:08 Warning radvd prefix length should be 64 for igc1
2024-08-29T15:55:13 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T15:55:00 Notice send_telemetry.py telemetry data collected 2 records in 0.03 seconds @2024-08-29 13:54:55.508004
2024-08-29T15:54:52 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T15:52:13 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '89172''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 89172: No such process'
2024-08-29T15:51:07 Warning radvd prefix length should be 64 for igc1
2024-08-29T15:47:30 Warning radvd prefix length should be 64 for igc1
2024-08-29T15:46:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T15:46:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T15:40:27 Warning radvd prefix length should be 64 for igc1
2024-08-29T15:38:33 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T15:38:12 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T15:36:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T15:36:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T15:33:28 Warning radvd prefix length should be 64 for igc1
2024-08-29T15:26:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T15:26:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T15:26:57 Warning radvd prefix length should be 64 for igc1
2024-08-29T15:21:53 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T15:21:32 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T15:18:18 Warning radvd prefix length should be 64 for igc1
2024-08-29T15:16:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T15:16:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T15:08:45 Warning radvd prefix length should be 64 for igc1
2024-08-29T15:06:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T15:06:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T15:05:13 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T15:04:52 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T15:01:05 Notice syslog-ng Configuration reload finished;
2024-08-29T15:01:05 Notice syslog-ng Configuration reload request received, reloading configuration;
2024-08-29T14:59:40 Warning radvd prefix length should be 64 for igc1
2024-08-29T14:56:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T14:56:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T14:54:58 Warning radvd prefix length should be 64 for igc1
2024-08-29T14:48:33 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T14:48:12 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T14:46:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T14:46:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T14:45:24 Warning radvd prefix length should be 64 for igc1
2024-08-29T14:38:24 Warning radvd prefix length should be 64 for igc1
2024-08-29T14:36:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T14:36:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing
2024-08-29T14:33:45 Warning radvd prefix length should be 64 for igc1
2024-08-29T14:31:53 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T14:31:32 Notice dhcp6c dhcp6c_script: RENEW on igc0 executing
2024-08-29T14:27:39 Warning radvd prefix length should be 64 for igc1
2024-08-29T14:26:58 Notice dhclient dhclient-script: Creating resolv.conf
2024-08-29T14:26:58 Notice dhclient dhclient-script: Reason RENEW on igc0 executing