OPNsense Forum

English Forums => 23.7 Legacy Series => Topic started by: seed on August 22, 2023, 08:18:48 am

Title: Unbound crashing
Post by: seed on August 22, 2023, 08:18:48 am
Sometimes unbound is crashing and the whole device gets unresponsive


2023-08-22T04:14:01   Critical   unbound   [85028:3] fatal error: Could not initialize thread   
2023-08-22T04:14:01   Error   unbound   [85028:3] error: Could not set root or stub hints
2023-08-22T04:14:01   Error   unbound   [85028:3] error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type   
            TypeError: an integer is required (got type NoneType)   
            os.write(self._pipe_fd, res.encode())   
            File "dnsbl_module.py", line 226, in log_entry   
            mod_env['logger'].log_entry(   
            File "dnsbl_module.py", line 378, in cache_cb   
            logger.close()   
            File "dnsbl_module.py", line 443, in deinit
Title: Re: Unbound crashing
Post by: CJ on August 22, 2023, 01:52:20 pm
Can you post your Unbound config?  Are you using any custom blocklists?
Title: Re: Unbound crashing
Post by: seed on August 22, 2023, 02:27:02 pm
i have no blocklist activated. No DNS blocking

my config from configctl
host overrides redacted.


Code: [Select]
   <unboundplus version="1.0.6">
      <general>
        <enabled>1</enabled>
        <port>53</port>
        <stats>1</stats>
        <active_interface/>
        <dnssec>1</dnssec>
        <dns64>0</dns64>
        <dns64prefix>64:ff9b::/96</dns64prefix>
        <noarecords>0</noarecords>
        <regdhcp>0</regdhcp>
        <regdhcpdomain/>
        <regdhcpstatic>0</regdhcpstatic>
        <noreglladdr6>1</noreglladdr6>
        <noregrecords>0</noregrecords>
        <txtsupport>0</txtsupport>
        <cacheflush>1</cacheflush>
        <local_zone_type>transparent</local_zone_type>
        <outgoing_interface/>
        <enable_wpad>0</enable_wpad>
      </general>
      <advanced>
        <hideidentity>1</hideidentity>
        <hideversion>1</hideversion>
        <prefetch>0</prefetch>
        <prefetchkey>0</prefetchkey>
        <dnssecstripped>1</dnssecstripped>
        <serveexpired>0</serveexpired>
        <serveexpiredreplyttl/>
        <serveexpiredttl/>
        <serveexpiredttlreset>0</serveexpiredttlreset>
        <serveexpiredclienttimeout/>
        <qnameminstrict>0</qnameminstrict>
        <extendedstatistics>1</extendedstatistics>
        <logqueries>1</logqueries>
        <logreplies>0</logreplies>
        <logtagqueryreply>0</logtagqueryreply>
        <logservfail>0</logservfail>
        <loglocalactions>0</loglocalactions>
        <logverbosity>1</logverbosity>
        <valloglevel>0</valloglevel>
        <privatedomain/>
        <privateaddress>0.0.0.0/8,10.0.0.0/8,100.64.0.0/10,169.254.0.0/16,172.16.0.0/12,192.0.2.0/24,192.168.0.0/16,198.18.0.0/15,198.51.100.0/24,203.0.113.0/24,233.252.0.0/24,::1/128,2001:db8::/32,fc00::/8,fd00::/8,fe80::/10</privateaddress>
        <insecuredomain/>
        <msgcachesize>100m</msgcachesize>
        <rrsetcachesize>200m</rrsetcachesize>
        <outgoingnumtcp>10</outgoingnumtcp>
        <incomingnumtcp>10</incomingnumtcp>
        <numqueriesperthread>4096</numqueriesperthread>
        <outgoingrange>8192</outgoingrange>
        <jostletimeout>200</jostletimeout>
        <cachemaxttl/>
        <cachemaxnegativettl/>
        <cacheminttl/>
        <infrahostttl>900</infrahostttl>
        <infrakeepprobing>0</infrakeepprobing>
        <infracachenumhosts>50000</infracachenumhosts>
        <unwantedreplythreshold>10000000</unwantedreplythreshold>
      </advanced>
      <acls>
        <default_action>allow</default_action>
      </acls>
      <dnsbl>
        <enabled>0</enabled>
        <safesearch>0</safesearch>
        <type>atf,aa,ag,bla0,bla,blf,blg,blp,blr,blr0,bls,blt,blt1,ep</type>
        <lists/>        <whitelists>*.redacted.tld,*.redacted.tld,*.redacted.tld,*.redacted.internal.tld</whitelists>
        <blocklists/>
        <wildcards/>
        <address/>
        <nxdomain>0</nxdomain>
      </dnsbl>
      <forwarding>
        <enabled>0</enabled>
      </forwarding>
Title: Re: Unbound crashing
Post by: CJ on August 23, 2023, 01:52:00 pm
Odd.  What version are you running?  23.7?

Why do you have dnsbl items configured if you're not using it?
Title: Re: Unbound crashing
Post by: karlson2k on August 23, 2023, 02:43:51 pm
Not sure that it is related, but I have problems with Unbound as well.
In my case ubound process starts eating 100% of CPU (as reported by top command).
Neither pluginctl nor web-interface can restart or stop the unbound daemon. When tried from web-interface it freezes for a minute (or so) and ends with nothing.
kill command can stop unbound only if used as "kill -9" (or "kill -kill").
This is pretty annoying.

I've removed "so-reuseport: no" from custom config to see whether it could fix anything. Unbound is running again with single thread only.

Problem started after upgrading from 23.1.x

I reported first here: https://forum.opnsense.org/index.php?topic=35475.0
Another similar report is here: https://forum.opnsense.org/index.php?topic=35523.0
Title: Re: Unbound crashing
Post by: seed on August 23, 2023, 03:26:06 pm
Odd.  What version are you running?  23.7?

Why do you have dnsbl items configured if you're not using it?

Running the latest 23.7.1_3 always updating to the latest version within 7 days.

DNSbl was configured buit caused probles so its disabled.


No idea why unbound is crashing once every few months?
Title: Re: Unbound crashing
Post by: karlson2k on August 23, 2023, 04:03:59 pm
In my case it's crashing once per day.
Title: Re: Unbound crashing
Post by: karlson2k on August 24, 2023, 08:08:05 am
Probably related as well: https://forum.opnsense.org/index.php?topic=35566.0
Title: Re: Unbound crashing
Post by: franco on August 24, 2023, 09:06:20 am
Thanks for the report. Can you try this patch? https://github.com/opnsense/core/commit/7406a5067f8

# opnsense-patch 7406a5067f8


Cheers,
Franco
Title: Re: Unbound crashing
Post by: seed on August 24, 2023, 09:30:32 am
Thank you.

Applied the patch and restarted the service. DNS still works. Since i dont know how to trigger the bug i hope its fixed.
Title: Re: Unbound crashing
Post by: franco on August 24, 2023, 09:57:45 am
If it should crash again posting the error helps a lot to take another look.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on August 24, 2023, 11:13:51 am
Applied the patch.
Tried to restart the Unbound from GUI, got in log
Code: [Select]
/usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '4507''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 4507: No such process'Unbound stopped. Had to start it manually.
Manual start was successful.
Title: Re: Unbound crashing
Post by: franco on August 24, 2023, 11:15:53 am
The error only tells us Unbound was already stopped when it was attempted to be stopped. Hardly a fatal issue here ;)


Cheers,
Franco
Title: Re: Unbound crashing
Post by: bbin on August 24, 2023, 03:27:11 pm
So good news, bad news.  The bad news is that unbound is still failing after roughly 30 minutes.  The good news is that the patch allows me to restart the service without rebooting.
Title: Re: Unbound crashing
Post by: seed on August 24, 2023, 04:52:26 pm
it might help sending the debug log entry from the crash
Title: Re: Unbound crashing
Post by: karlson2k on August 25, 2023, 08:42:46 pm
In my case unbound process hungs again after ~27 hours of normal running (with 7406a5067f8 patch). It was using 100% CPU (as reported by 'top' command). Only 'kill -9 PID' helped. GUI cannot restart it.
The last record in unbound log file is
[73750:0] info: service stopped (unbound 1.17.1).
The related record in the system log are
New IP Address (for WAN interface)
and
/usr/local/etc/rc.newwanip ...

After 'kill -9 unboundpid' I've got a lot of log entries like:
/usr/local/etc/rc.newwanip: The command '/sbin/mount -r -t nullfs '/usr/local/lib/python3.9' '/var/unbound/usr/local/lib/python3.9'' returned exit code '1', the output was 'mount_nullfs: /var/unbound/usr/local/lib/python3.9: Resource deadlock avoided'

Two suggestions:
* kill services by 'kill -kill' after timeout if they cannot be stopped/restarted as normal.
* stop running 'rc.newwanip' if nothing has changed. My ISP has IP renewal every 10 minutes, but every time the same IP is assigned (and the same mask, I don't use my ISP DNS servers). It makes no sense to run a lot of processes related to IP update if nothing has changed.

Note: I had so-reuseport: no in unbound config to use multi-thread and I have RSS enabled.
Title: Re: Unbound crashing
Post by: karlson2k on August 27, 2023, 04:11:52 pm
Additional note: my router hardware has 8 vCPU cores.
Title: Re: Unbound crashing
Post by: karlson2k on August 29, 2023, 09:02:47 pm
Unbound hangs again. Without so-reuseport: no it has been running longer (with single thread), but hangs up anyway.
Title: Re: Unbound crashing
Post by: karlson2k on August 29, 2023, 09:18:38 pm
Probably related:
https://forum.opnsense.org/index.php?topic=35666.0
Title: Re: Unbound crashing
Post by: seed on August 29, 2023, 09:21:27 pm
Thanks for the report. Can you try this patch? https://github.com/opnsense/core/commit/7406a5067f8

# opnsense-patch 7406a5067f8


Cheers,
Franco

in my case the problem did not happen again.
Title: Re: Unbound crashing
Post by: franco on August 29, 2023, 09:21:33 pm
Sounds like more upstream bugs. Is this all on DoT? It's still rather buggy after all these years.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on August 29, 2023, 09:25:20 pm
My unbound is forwarding all requests to local DNSCryptProxy. LAN clients are using simple DNS protocol (mostly UDP).
DNSSEC is enabled.
Title: Re: Unbound crashing
Post by: franco on August 30, 2023, 08:18:45 am
Hmm, ok, that's a basic setup then.

What is dnscrypt-proxy doing that Unbound cannot? Or is this a separate dnscrypt-proxy instance and not the plugin?

Do you need Unbound to forward? Maybe you can use Dnsmasq to do that job instead or use dnscrypt-proxy directly... it works fine nowadays as core DNS server, see docs:

https://docs.opnsense.org/manual/how-tos/dnscrypt-proxy.html#example-standalone-dns


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on August 30, 2023, 10:18:58 am
My setup is based on the standard repo. DnsCrypt-proxy is used as a plugin.

I need flexibility of Unbound:
* good integration with DHCP (I'm not aware whether DnsCrypt-proxy is integrated)
* DNS leak control by specific zones
* Some DNS names overrides
* Forward requests for ISP-specific domains directly for ISP's DNS servers
* Integration with OpenNIC for some top-level domans (via manual config)
* Some other features

Unbound alone is not enough as it works without encrypted channels, allow my ISP (and other structures) easily intercept the traffic and modify remote responses.

Dnsmasq lacks some of the required features.
Title: Re: Unbound crashing
Post by: karlson2k on August 30, 2023, 10:21:46 am
I may install GDB to find where Unbound is getting stuck. Is there any way to download debug symbols for Unbound?
Title: Re: Unbound crashing
Post by: karlson2k on August 30, 2023, 02:27:31 pm
Probably, I've found the reason why the bug is triggered on my router.
As I wrote (https://forum.opnsense.org/index.php?topic=35527.msg173046#msg173046) previously, my ISP set IP renewal interval to 10 minutes. While nothing is changed, a lot of processes run every 10 minutes, including Unbound restart.
So to reproduce the issue, use WAN IP with short renewal internal, like every minute. Or probably a lot of manual restarts may trigger the same issue.

Today I've got a new type of Unbound problem:
Code: [Select]
2023-08-30T15:07:09 Critical unbound [49665:2] fatal error: Could not initialize thread
2023-08-30T15:07:09 Informational unbound [49665:2] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0
2023-08-30T15:07:09 Informational unbound [49665:2] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2023-08-30T15:07:09 Error unbound [49665:2] error: Could not set root or stub hints
2023-08-30T15:07:09 Error unbound [49665:2] error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type
2023-08-30T15:07:09 Notice unbound [49665:0] notice: init module 2: iterator
2023-08-30T15:07:09 Notice unbound [49665:0] notice: init module 1: validator
2023-08-30T15:07:09 Notice unbound daemonize unbound dhcpd watcher.
2023-08-30T15:07:09 Notice unbound [49665:0] notice: init module 0: python

I have default root.hints file and it is not changed from run to run. :)

For me it looks like memory corruption problem or some broken ABI interface.
Title: Re: Unbound crashing
Post by: franco on August 30, 2023, 02:54:37 pm
This is likely due to an interface selection in the Unbound settings. Using the recommended empty selection will not force a restart on every DHCP renew.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on August 30, 2023, 03:34:58 pm
I'll try. However, I don't want Unbound to reply on WAN interfaces. Yes, the access lists are configured, but it would be safer to not bind to (reply on) external interfaces
Title: Re: Unbound crashing
Post by: karlson2k on August 30, 2023, 03:42:40 pm
Probably it worth to avoid resetting everything if IP stays the same? The script sees the old IP and the new IP.
Or at least add interface option "Do not enforce re-binding of daemons if IP hasn't changed with DHCP lease update" or something like this.
Title: Re: Unbound crashing
Post by: karlson2k on August 30, 2023, 06:30:50 pm
I changed interfaces selection to "All" (empty), but Unbound still is restarting with each DHCP license update on WAN ports. Probably because OpenVPN reconfiguration is triggered by

/usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : unbound_configure_do(,opt1))
/usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN2.
/usr/local/etc/rc.newwanip: ROUTING: entering configure using 'opt1'

each time when IP is renewed.
Title: Re: Unbound crashing
Post by: franco on August 30, 2023, 10:03:55 pm
Looks like the relevant parts to avoid reload had been removed in 2021... https://github.com/opnsense/core/commit/4a1bc9f8b5e65651e8

I don't see a reason not to revive this, but still a bit weird that this went unnoticed since then or new upstream bugs came to light at some point.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: CJ on September 05, 2023, 02:21:32 pm
Sounds like more upstream bugs. Is this all on DoT? It's still rather buggy after all these years.

Is there something particular about DoT that you think is buggy?  I've been running it for quite a while now and haven't noticed any issues.
Title: Re: Unbound crashing
Post by: franco on September 05, 2023, 02:51:44 pm
This one was fixed in 1.18.0 just then (but we had the fix already on 23.7):

https://github.com/NLnetLabs/unbound/commit/52581f86447

It shows bottom line poor coding surfacing due to ASLR "breaking" the execution.

There were a few others over the years.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: newsense on September 05, 2023, 03:20:18 pm
Unbound 1.18.0 runs fine here, nothing unusual in the logs so far.


For anyone interested it can be installed/tested now.

Code: [Select]
fetch https://pkg.opnsense.org/FreeBSD:13:amd64/snapshots/latest/All/unbound-1.18.0.pkg && pkg install unbound-1.18.0.pkg
Title: Re: Unbound crashing
Post by: karlson2k on September 11, 2023, 10:49:16 am
Unbound (1.17.1) is not hanging with OPNsense 23.7.3 when log level set to "Level 4" or "Level 3".

I've changed it back to "Level 1" to check the situation.

It would be nice to avoid Unbound reloading when IP address is renewed to the same value. Looks like currently Unbound cache is killed every 10 minutes (IP renewal period).
Title: Re: Unbound crashing
Post by: karlson2k on September 11, 2023, 11:05:17 am
Probably related: https://forum.opnsense.org/index.php?topic=35878.0
Title: Re: Unbound crashing
Post by: karlson2k on September 11, 2023, 03:59:19 pm
With "Level 1" log, Unbound again failed to restart, just like before: https://forum.opnsense.org/index.php?topic=35527.msg173533#msg173533. The minor difference is additional error entry:
error: reading root hints /root.hints 28:37: Syntax error, could not parse the RR's class
after similar error entry:
error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type

An obvious workaround is using more detailed log messages. However, I don't want to kill my SSD too quickly.

I will try with Unbound 1.18.0

Title: Re: Unbound crashing
Post by: karlson2k on September 12, 2023, 03:16:04 pm
Unbound version https://pkg.opnsense.org/FreeBSD:13:amd64/snapshots/latest/All/unbound-1.18.0.pkg hasn't hung so far.
I'll try with so-reuseport: no to see whether problem would appear with multi-threaded Unbound.
Title: Re: Unbound crashing
Post by: franco on September 12, 2023, 09:10:13 pm
1.18.0 will be in 23.7.4 on Thursday.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on September 13, 2023, 11:10:42 am
I got a new Unbound at 100% CPU with so-reuseport: no.
Switched back to single-threaded. Several hours without freezing.
Title: Re: Unbound crashing
Post by: franco on September 13, 2023, 11:13:15 am
Is this only with RSS or always?


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on September 14, 2023, 01:00:44 am
I have Intel NICs and RSS is always enabled on my hardware.

I can try with RSS disabled just to check the results.
Title: Re: Unbound crashing
Post by: newsense on September 14, 2023, 07:50:33 am
If it's RSS related it may be a driver specific issue ? Not seeing anomalies on igb, igc and em drivers, with or without RSS enabled, Unbound 1.17.1 or 1.18.0 with DoT and running on port 53.


This is from an APU4

Code: [Select]
root@OPNsense:~ # netstat -Q
Configuration:
Setting                        Current        Limit
Thread count                         4            4
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs          enabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   1000    cpu   hybrid   C--
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256    cpu   direct   C--
ip6        6   1000    cpu   hybrid   C--
ip_direct     9    256    cpu   hybrid   C--
ip6_direct    10    256    cpu   hybrid   C--

Workstreams:
WSID CPU   Name     Len WMark   Disp'd  HDisp'd   QDrops   Queued  Handled
   0   0   ip         0   200        0    77796        0   380237   458033
   0   0   igmp       0     0        0        0        0        0        0
   0   0   rtsock     0     0        0        0        0        0        0
   0   0   arp        0     0     3802        0        0        0     3802
   0   0   ether      0     0  1568827        0        0        0  1568827
   0   0   ip6        0    15        0        0        0   255964   255964
   0   0   ip_direct     0     0        0        0        0        0        0
   0   0   ip6_direct     0     0        0        0        0        0        0
   1   1   ip         0   117        0   190911        0   192458   383369
   1   1   igmp       0     0        0        0        0        0        0
   1   1   rtsock     0     6        0        0        0     3396     3396
   1   1   arp        0     0        1        0        0        0        1
   1   1   ether      0     0   441355        0        0        0   441355
   1   1   ip6        0    29        0        0        0   319858   319858
   1   1   ip_direct     0     0        0        0        0        0        0
   1   1   ip6_direct     0     0        0        0        0        0        0
   2   2   ip         0   129        0    68245        0  1618864  1687109
   2   2   igmp       0     0        0        0        0        0        0
   2   2   rtsock     0     0        0        0        0        0        0
   2   2   arp        0     0    41070        0        0        0    41070
   2   2   ether      0     0   802177        0        0        0   802177
   2   2   ip6        0    63        0    95122        0   297665   392787
   2   2   ip_direct     0     0        0        0        0        0        0
   2   2   ip6_direct     0     0        0        0        0        0        0
   3   3   ip         0   183        0   134480        0   361672   496152
   3   3   igmp       0     0        0        0        0        0        0
   3   3   rtsock     0     0        0        0        0        0        0
   3   3   arp        0     0        6        0        0        0        6
   3   3   ether      0     0   720936        0        0        0   720936
   3   3   ip6        0   264        0    81594        0   551511   633105
   3   3   ip_direct     0     0        0        0        0        0        0
   3   3   ip6_direct     0     0        0        0        0        0        0
root@OPNsense:~ # unbound-control -c /var/unbound/unbound.conf status
version: 1.18.0
verbosity: 1
threads: 4
modules: 3 [ python validator iterator ]
uptime: 64425 seconds
options: control(ssl)
unbound (pid 2627) is running...
Title: Re: Unbound crashing
Post by: karlson2k on September 14, 2023, 11:46:31 am
As I wrote earlier (https://forum.opnsense.org/index.php?topic=35527.msg173533#msg173533), the issue is most likely triggered by frequent Unbound restarts (I have short-lived DHCP upstream licenses, every renewal of WAN IP address unconditionally initiates Unbound restart).
Probably having 8 threads also increases the probability of the freeze.

I think Unbound is freezing either at stop or at start.

Version 1.17.1 hangs quickly with so-reuseport: no (multi-threaded, according to statistics). Without so-reuseport: no (all requests are handled by single thread only, by statistics) it hangs later.
Version 1.18.0 hangs with so-reuseport: no. Without it I haven't faced a freeze yet.

Detailed (level 3 and level 4) log somehow prevents Unbound freezing, so I cannot tell precisely what's triggering the issue.

NICs drivers are igb.
8 vCPU cores are available.

Code: [Select]
# unbound-control -c /var/unbound/unbound.conf status
version: 1.18.0
verbosity: 1
threads: 8
modules: 3 [ python validator iterator ]
uptime: 460 seconds
options: reuseport control(ssl)
unbound (pid 76726) is running...

Current so-reuseport: yes comes from default plugin configuration, preventing handling of requests by multiple threads.

Schedule pluginctl dns to be called every one or two minutes (or even 15 seconds) to quickly trigger the issue, preferably with so-reuseport: no in Unbound configuration.
To fully reproduce my configuration, use requests forwarding (I'm using local DncCrypt-Proxy, but I think it is not important for Unbound where the requests are forwarded to).
Title: Re: Unbound crashing
Post by: newsense on September 14, 2023, 12:51:29 pm
What happens if you ditch DNScrypt and forward to 1.1.1.1:853 instead ? Can you still trigger it ?

If performance is at stake then Cloudflaree is one of the fastest. DNScrypt on the other hand - if using thee stock one - might not be the best tool here, it's quite old and in need of an update (maybe should be removed from the plugin list ?)
Title: Re: Unbound crashing
Post by: karlson2k on September 14, 2023, 01:59:56 pm
Then traffic would be easier to intercept or block. Also Cloudflare would get the full list of my DNS requests.
As local server is used, it must be complaint with local regulation, including full reports, "legal" interception and censorship.
Not nice, not a solution for me.

However, I may test it.

In any case, the broken part is Unbound, not DnsCrypt-proxy.
Title: Re: Unbound crashing
Post by: franco on September 14, 2023, 02:07:55 pm
Here is the promised patch:

https://github.com/opnsense/core/commit/a086f40b

# opnsense-patch a086f40b


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on September 14, 2023, 02:15:54 pm
Here is the promised patch:

https://github.com/opnsense/core/commit/a086f40b

Applied.
I will test for a while without so-reuseport: no and then will try with real multi-threading.
Title: Re: Unbound crashing
Post by: karlson2k on September 14, 2023, 03:41:54 pm
Applied.
It's beautiful to see Unbound requests finally served from cache. Previously just 5-20% were served by cache and the rest is recursive.
Now is 80-90% of requests are served from cache.

Thanks for the fix, franco!

Load average became almost twice lower.

I'll test with so-reuseport: no now.
Title: Re: Unbound crashing
Post by: karlson2k on September 14, 2023, 03:53:58 pm
When tried to restart Unbound, I got
Code: [Select]
Error unbound [84125:2] error: Could not set root or stub hints
Error unbound [84125:2] error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type
Critical unbound [84125:2] fatal error: Could not initialize thread
The kind of error I've seen before.

Unbound process were using 100% CPU.
GUI cannot update Unbound status.

I had to kill the process by kill -9 84125, then it is restarted.

I saw this kind of errors before.
Looks like Unbound freeze at the start because of /root.hints parsing error.
How it's possible?
Does Unbound specify the full path in the log and it tries to parse the file located in the root directory?
Title: Re: Unbound crashing
Post by: franco on September 14, 2023, 04:06:30 pm
It indicates a general restart issue. Did you ever do a heath audit?


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on September 14, 2023, 05:02:00 pm
I was using default Unbound from standard OPNsense installation.
Then I installed version 1.18.0 from OPNsernse repo, as was suggested here (https://forum.opnsense.org/index.php?topic=35527.msg174125#msg174125).

Now I've changed it back to default one (the same 1.18.0 version).
Code: [Select]
***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 23.7.4 at Thu Sep 14 17:52:17 2023
>>> Check installed kernel version
Version 23.7.4 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 23.7.4 is correct.
>>> Check for missing or altered base files
No problems detected.
>>> Check installed repositories
OPNsense
>>> Check installed plugins
os-dnscrypt-proxy 1.14_1
os-smart 2.2_2
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: ....
opnsense-23.7.4: checksum mismatch for /usr/local/etc/inc/plugins.inc.d/unbound.inc
Checking all packages......... done
>>> Check for core packages consistency
Core package "opnsense" has 68 dependencies to check.
Checking packages: ..................................................................... done
***DONE***

File unbound.inc was patched by suggested patch (https://forum.opnsense.org/index.php?topic=35527.msg174927#msg174927).
Title: Re: Unbound crashing
Post by: karlson2k on September 14, 2023, 05:05:33 pm
DNScrypt on the other hand - if using thee stock one - might not be the best tool here, it's quite old and in need of an update (maybe should be removed from the plugin list ?)
The FreeBSD repo has the latest version: https://www.freshports.org/dns/dnscrypt-proxy2

Let me know if help for plugin update is needed, I'm ready to work on it.
Title: Re: Unbound crashing
Post by: karlson2k on September 18, 2023, 08:57:22 am
Here is the promised patch:

https://github.com/opnsense/core/commit/a086f40b

# opnsense-patch a086f40b

Several days with this patch and no issues even with so-reuseport: no.
The Unbound cache is finally filled with useful data and caching mechanisms are providing benefits.
The upstram DHCP lease is still 10 minutes long, but now it doesn't cause Unbound reload.
Thanks!

Note: the restart issue is still here, could be triggered by manual restart.
Title: Re: Unbound crashing
Post by: CJ on September 20, 2023, 02:09:58 pm
Then traffic would be easier to intercept or block. Also Cloudflare would get the full list of my DNS requests.
As local server is used, it must be complaint with local regulation, including full reports, "legal" interception and censorship.
Not nice, not a solution for me.

However, I may test it.

In any case, the broken part is Unbound, not DnsCrypt-proxy.

How do you have dnscrypt configured?  Are you using it to do recursive root resolution?  Just trying to understand the benefits over using DoT (not necessarily cloudflare).
Title: Re: Unbound crashing
Post by: karlson2k on September 28, 2023, 11:27:14 am
The last update to 23.7.5 reverted "no-restart" patch. Unbound starts hanging again.
I had to re-apply the patch. Any chance that the patch will be backported to 23.7?
Title: Re: Unbound crashing
Post by: karlson2k on September 28, 2023, 11:34:16 am
Then traffic would be easier to intercept or block. Also Cloudflare would get the full list of my DNS requests.
As local server is used, it must be complaint with local regulation, including full reports, "legal" interception and censorship.
Not nice, not a solution for me.

How do you have dnscrypt configured?  Are you using it to do recursive root resolution?  Just trying to understand the benefits over using DoT (not necessarily cloudflare).
Is it mostly default configuration. DNSCrypt-Proxy downloads the list of public servers available via DNSCrypt or DNS-over-HTTPS then automatically detect fastest servers and send the request to random subset of the short list of the fastest servers.

No server gets the complete list of all your DNS queries.
Title: Re: Unbound crashing
Post by: joshndroid on September 28, 2023, 12:09:22 pm
I seem to be having a similar issue, across both 23.7.4 and 23.7.5...

Code: [Select]
2023-09-28T18:20:01Criticalunbound[14883:3] fatal error: Could not initialize thread

2023-09-28T18:20:01Errorunbound[14883:3] error: Could not set root or stub hints

2023-09-28T18:20:01Errorunbound[14883:3] error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type

I am not sure what I should be doing through reading this whole thread to fix it?

I had monit looking at restarting unbound for me, but i have since turned that off in attempting to see what is causing the issue
Title: Re: Unbound crashing
Post by: franco on September 28, 2023, 07:36:07 pm
I checked https://www.internic.net/domain/named.root and it's the same file (except the dates) as

/usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints

that we use to bootstrap the root servers.

# md5 /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints
MD5 (/usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints) = ac281ab5712d761d1a4e7a7224b89666

Should be the same as

# md5 /var/unbound/root.hints
MD5 (/var/unbound/root.hints) = ac281ab5712d761d1a4e7a7224b89666

If not it would be helpful to diff:

# diff -u /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints


Cheers,
Franco
Title: Re: Unbound crashing
Post by: nerf on September 29, 2023, 12:24:03 am
Same issue here  (on 23.7.5) - amongst others.
Title: Re: Unbound crashing
Post by: franco on September 29, 2023, 08:17:04 am
I just love a "me too" without an error log attached and ignoring the last post on how to debug this further. ;)
Title: Re: Unbound crashing
Post by: newsense on September 29, 2023, 08:33:33 am
#Me2 ;-)
Title: Re: Unbound crashing
Post by: newsense on September 29, 2023, 08:37:06 am
Back on topic, did a quick check on 3 FWs, nothing to report

Code: [Select]
root@OPNsense:~ # diff -u /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints
root@OPNsense:~ #
Title: Re: Unbound crashing
Post by: nerf on September 30, 2023, 02:48:21 am
Yes, I know logs would be good.
But is hard given that
1. DNSmasq is in use due to Unbound's issues
2. I would need to capture a crash, which is not always predictable or reproducible.

I will give it a try. Would logs at "Error" level suffice?
Title: Re: Unbound crashing
Post by: franco on September 30, 2023, 10:38:38 am
Why is everyone ignoring my post? ;) I don't really care about the logs. They don't tell us what the error is supposed to be but not how it's triggered and why it's persistent.

 https://forum.opnsense.org/index.php?topic=35527.msg176361#msg176361
Title: Re: Unbound crashing
Post by: nerf on October 01, 2023, 08:41:49 am
Ok did the diff against /var/unbound/root.hints, no output.

I will be switching back over to unbound. In the event of a crash what would be needed (captures, logs, files, command line etc.) to diagnose the issue?
Title: Re: Unbound crashing
Post by: karlson2k on October 02, 2023, 04:37:22 pm

If not it would be helpful to diff:

# diff -u /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints
I'm still experiencing the same issue. Either error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type or error: reading root hints /root.hints 28:37: Syntax error, could not parse the RR's class.
Code: [Select]
# diff -q /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints && echo 'The files are identical.'
The files are identical.
Title: Re: Unbound crashing
Post by: newsense on October 03, 2023, 04:14:43 am
I wonder if you have a corrupt there even if it doesn't look like that...

Try removing the file and restarting Unbound.

Code: [Select]
service unbound stop && rm -v /var/unbound/root.hints & cp -v /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints && service unbound onestart
Title: Re: Unbound crashing
Post by: karlson2k on October 03, 2023, 09:12:56 pm
No need.
Unfound could be stopped by kill -9. Then manual service restart works fine again.
Title: Re: Unbound crashing
Post by: karlson2k on October 03, 2023, 09:13:36 pm
One more report:
https://forum.opnsense.org/index.php?topic=36270.0
Title: Re: Unbound crashing
Post by: Bierfassl82 on October 07, 2023, 05:50:32 pm
One more report:
https://forum.opnsense.org/index.php?topic=36270.0

Have the same problem, unbound crashes sporadically with the same log entry. In addition, the CPU load is extremely increased at this moment compared to normal.
Title: Re: Unbound crashing
Post by: karlson2k on October 09, 2023, 11:11:56 am
A few observations:
* On OPNsense 23.7.x both Unbound versions (1.17.1 and 1.18.0) have the same problem
* OPNsense 23.1.x has the Unbound version 1.17.1

Therefore possible reasons:
* Some changes in OPNsense 23.7 broke the Unbound startup (like the daemon is started while files are being copied still)
* Some patches added in OPNsense 23.7 for Unbound broke the things (I'm not sure whether any patches were added)
* Some changes in FreeBSD kernel (like ASLR) broke badly designed Unbound processing

As log levels 3 and 4 somehow workaround the problem (while keep hammering the SSD), most likely the problem is caused by parralel statup processing (either OPNsense initialisation scripts or Unbound itself). I think detailed logs just slow down the startup so parallel processes have enough time to complete.
Title: Re: Unbound crashing
Post by: nerf on October 10, 2023, 09:14:25 am
I wonder if you have a corrupt there even if it doesn't look like that...

Try removing the file and restarting Unbound.

Code: [Select]
service unbound stop && rm -v /var/unbound/root.hints & cp -v /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints && service unbound onestart

OK - unbound has been crashing - executed the above - will monitor.
Title: Re: Unbound crashing
Post by: karlson2k on October 10, 2023, 11:07:44 am
OK - unbound has been crashing - executed the above - will monitor.

This will not help.

If Unbound is started successfully, it will continue to work fine until restarted.
When OPNsense (re-)starts Unbound, the script re-creates '/var/unbound/root.hints' automatically.
Title: Re: Unbound crashing
Post by: franco on October 10, 2023, 11:10:39 am
Yeah yeah, but is the file modified or does unbound throw a spurious error while the contents of the file is ok? Because if the file is ok it's something very nasty inside unbound and that would be my guess all along.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on October 10, 2023, 12:49:34 pm
Or third option: the file is being created in parallel with the start of Unbound. Then file content should be OK, while Unbound may read the file with an error.
I summarised possible options here: https://forum.opnsense.org/index.php?topic=35527.msg177257#msg177257
Title: Re: Unbound crashing
Post by: franco on October 10, 2023, 12:52:32 pm
We're talking about a very narrow window of opportunity here that seems to be 100% reproducible? I'm still a bit sceptical. But you could just try to make a copy of the file and change the source code for the unbound.conf to point to that one that is not touched...

These theories are very easy to test when you can reproduce. If not it's impossible.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: nerf on October 11, 2023, 08:44:56 am

Screenshot attached from the system log MAY be relevant. Going back to DNSMasq, till I can figure out what is actually happening.


Title: Re: Unbound crashing
Post by: karlson2k on October 17, 2023, 10:35:05 am
We're talking about a very narrow window of opportunity here that seems to be 100% reproducible?
It happens one time out of 150-200 starts. So it is far from 100% reproducible.
I'm still a bit sceptical. But you could just try to make a copy of the file and change the source code for the unbound.conf to point to that one that is not touched...

These theories are very easy to test when you can reproduce. If not it's impossible.

I tried to isolate Unbound-related changes from other changes in OPNsense between 23.1 and 23.7, but I failed.
Too many changes and I'm not sure that all of them could be reverted without making changes in other components or frameworks.

Could you, please, point me to the easiest way to test OPNsense changes for Unbound isolated from other changes?
Title: Re: Unbound crashing
Post by: franco on October 17, 2023, 10:37:06 am
See https://forum.opnsense.org/index.php?topic=36425.msg177797#msg177797
Title: Re: Unbound crashing
Post by: karlson2k on October 18, 2023, 10:06:18 am
Tried with the "diagnose tool" patch.

After ~100 restarts, Unbound silently crashed.
Had to (re-)start it manually.

The log:

Code: [Select]
2023-10-18T11:00:00 Informational unbound 24371 [24371:1] info: generate keytag query _ta-4f66. NULL IN
2023-10-18T11:00:00 Informational unbound 24371 [24371:0] info: start of service (unbound 1.18.0).
2023-10-18T11:00:00 Notice unbound 24371 [24371:0] notice: init module 2: iterator
2023-10-18T11:00:00 Notice unbound 24371 [24371:0] notice: init module 1: validator
2023-10-18T11:00:00 Notice unbound 24378 daemonize unbound dhcpd watcher.
2023-10-18T11:00:00 Notice unbound 24371 [24371:0] notice: init module 0: python
2023-10-18T10:52:02 Notice unbound 23736 [23736:0] notice: init module 2: iterator
2023-10-18T10:52:02 Notice unbound 23736 [23736:0] notice: init module 1: validator
2023-10-18T10:52:02 Notice unbound 23743 daemonize unbound dhcpd watcher.
2023-10-18T10:52:02 Notice unbound 23736 [23736:0] notice: init module 0: python

Relevant record from the system log:
Code: [Select]
2023-10-18T10:52:02 Notice kernel <6>pid 23736 (unbound), jid 0, uid 59: exited on signal 11
Title: Re: Unbound crashing
Post by: franco on October 18, 2023, 10:08:25 am
I'm sorry to say the root.hints theory then appears to be a straw man and Unbound crashes for other reasons.


Cheers,
Franco
Title: Re: Unbound crashing
Post by: karlson2k on October 19, 2023, 07:57:20 am
The behaviour is different.
Unbound just crashed and I was able to simply restart it in GUI.
Without the test patch Unbound uses 100% CPU and had to be killed by 'kill -9' in CLI.

So something has changed.
Title: Re: Unbound crashing
Post by: karlson2k on October 25, 2023, 07:08:43 am
Quote
https://github.com/opnsense/core/commit/845fbd384fe

# opnsense-patch 845fbd384fe
This patch significantly changed the situation.
Unbound is not crashing anymore, while without this patch Unbound was crashing daily.
I'm testing it for several days. The settings were chosen to trigger crash as much as possible (no debugging logging, parallel threads).

Probably without this patch the file is created in parallel with normal Unbound startup.
With this patch the file is created always before the start of Unbound.
Title: Re: Unbound crashing
Post by: karlson2k on November 01, 2023, 02:24:00 pm
Only after two weeks of running with # opnsense-patch 845fbd384fe I got freeze of Unbound again.

The last lines in the log:
Code: [Select]
2023-11-01T16:06:07 Notice unbound 17147 [17147:0] notice: init module 0: python
2023-11-01T16:03:33 Informational unbound 15198 [15198:0] info: service stopped (unbound 1.18.0).
2023-11-01T15:53:34 Informational unbound 15198 [15198:1] info: generate keytag query _ta-4f66. NULL IN
Note: at 16:06 I manually killed unbound process with kill -9.

The system log:
Code: [Select]
2023-11-01T16:06:07 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : webgui_configure_do(,opt1))
2023-11-01T16:06:07 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : vxlan_configure_do())
2023-11-01T16:03:32 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : unbound_configure_do(,opt1))
2023-11-01T16:03:32 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : openssh_configure_do(,opt1))
2023-11-01T16:03:32 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : opendns_configure_do())
2023-11-01T16:03:32 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : ntpd_configure_do())
2023-11-01T16:03:32 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : dnsmasq_configure_do())
2023-11-01T16:03:32 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (,opt1)

Observations: Unbound wasn't actually stopped.

Probably the problem is not with Unbound start, but with Unboud stop procedure.

OPNsense thinks that Unbound is stopped (reported in Unbound log), but actually Unbound is still running. The parallel start of the Unbound, while another copy is still running resulted in various problems.

Previously reported high CPU load was produced by the stopping Unbound process, not by the just started process.
Title: Re: Unbound crashing
Post by: LOTRouter on November 08, 2023, 05:19:26 am
I've been troublshooting this issue for months, and didn't fully realize it was Unbound until reading this post.  Now when my internet goes down, TOP shows the unbound process at 100%.  I block DoH, and intercept all DNS and forward all Unbound request to DNSCrypt-Proxy, which means I have no fallback when Unbound goes stupid.  When this setup works, it works wonderfully.  However, when it stops, all DNS queries on my network go unanswered.  Even with this patch my unbound process would end up at 100% utilization nightly and I had to kill -9 <pid> it to get it to recover.

For now I've switched to using DNSCrypt-Proxy native, and I haven't seen a problem in almost two weeks.  However, I'm willing to go back to Unbound in the name of testing and troubleshooting for any developer that is willing to look into fixing this. I just need hints of what to collect for when it goes stupid.

I do have Flush DNS Cache during reload enabled, which I now wonder if that exacerbated this issue.
Title: Re: Unbound crashing
Post by: karlson2k on November 09, 2023, 07:13:38 pm
I do have Flush DNS Cache during reload enabled, which I now wonder if that exacerbated this issue.
BTW I have it enabled as well.
Maybe "disabled" just slow down startup enough to have mask the issue?
Title: Re: Unbound crashing
Post by: zentoo on November 10, 2023, 05:35:06 pm
I join this boat: I spend two days to understand what was happening on one of our opnsense server with unbound since client servers using it as DNS resolver get regularly timeout on DNS requests.

When the problem arise one CPU was stuck at 100% by unbound process.
Web UI unbound stop or restart froze Web UI. I need to kill -9 unbound PID in order to start it back.

I understand latter that it was unbound restart the culprit because I didn't have this problem before.

I explain more in details: we got two opnsense server as Master/Slave firewalls with master one executing a cron every minute to synchronize slave configuration using command
Code: [Select]
HA update and reconfigure backup.
Few days ago we have updated opnsense on slave then use it as temporary master for validation before updating  master one and for out of topic reason we left the slave opnsense as temporary master usage.
8 hours later unbound process get stuck using 100% of one CPU and unbound start to generate timeout for clients.
The process have stayed stuck for hours while I was investigating issue on client side before I understood the problem was opnsense unbound process.

I continue to monitor the process after have killed and restarted it and I was surprised to observe that unbound restart every minute while not on master opnsense. So I understood that unbound is restarted at each master/slave synchro (with unbound service selected for synchro) and that lead to the issue where unbound is stuck with one cpu at 100% after a while.

It seems that there is a kind of race condition when unbound process restart.
I don't know enough how service is managed on opnsense but it's possibly a problem with flock and/or PID detection/creation/deletion.

I have observed another problem: when unbound process is stuck and the sync cron try to restart unbound, it's possible that a new mount point appeared:
Code: [Select]
devfs on /var/unbound/dev (devfs).
So after a while you can observe several times the same mount point of /var/unbound/dev and that need to be cleaned manually.
Title: Re: Unbound crashing
Post by: karlson2k on November 15, 2023, 09:04:12 am
Looks like it is not a restart problem, but it is a start problem.

The process stuck at 100% CPU is the new Unbound process. Not the old one.

Two things may significantly reduce the chances of startup problems: Log level 3 or higher, not enabled "Flush DNS Cache during reload".
Both things seems to just slow down the startup and resolve the race condition.

To reproduce:
* Set Unbound log level to 1
* Enable "Flush DNS Cache during reload"
* Run as root: sh -c 'while :; do  pluginctl unbound_start; sleep 20; done'

After a few iterations the startup problem should be triggered.
Title: Re: Unbound crashing
Post by: karlson2k on November 15, 2023, 09:07:19 am
The problem seems to be OPNsense-only.

I found no report about similar problems on Linux, pure FreeBSD, PFsense or anything.
However, it easy to find several similar reports for OPNsense.
Title: Re: Unbound crashing
Post by: karlson2k on November 16, 2023, 10:43:26 am
To reproduce:
* Set Unbound log level to 1
* Enable "Flush DNS Cache during reload"
* Run as root: sh -c 'while :; do  pluginctl unbound_start; sleep 20; done'

After a few iterations the startup problem should be triggered.
It should be easier to reproduce the issue now.
I hope the developers will take a look on it.
Title: Re: Unbound crashing
Post by: zentoo on November 20, 2023, 02:47:22 pm
On my master/slave opnsense setup with a configuration synchronisation per minute (cron command: HA update and reconfigure backup) I've tried to debug further:

[System: High Availability: Settings] Unbound DNS: selected
=> unbound 100% CPU after a while on slave opnsense with an unbound restart every minute

[System: High Availability: Settings] Unbound DNS: not selected
=> unbound 100% CPU after a while on slave opnsense with an unbound restart every minute

No High Availability synchronisation between master and slave opnsense
=> no unbound restart on slave opnsense so no problem

I tried to understand which High Availability settings make the restart of unbound and in fact there is no dependencies logic at all. If there is any service selected for synchronisation, unbound will be restarted at synchronisation time so even if it's not needed.

I have tried to trigger the problem as you do but didn't succeed even with an unbound restart every 2s.

So I've explored unbound init script to see how it manages the pid file to avoid a double unbound process.
I didn't find specific clue because pid is managed by daemon utility.
On my precedent debug session I have noticed several /var/unbound/dev mount points and I think it is due to a race condition of several unbound starting in the same time.
So I've setup a simple way to check this:

Monitoring:
Code: [Select]
while true; do echo "$(date) $(stat -x /var/run/unbound.pid | grep Change:) file: $(cat /var/run/unbound.pid) pid: $(pgrep unbound) mount: $(mount | grep -c /var/unbound/dev)"; sleep 0.1 ; done
Trigger (5 parallel start of unbound):
Code: [Select]
pluginctl unbound_start & pluginctl unbound_start & pluginctl unbound_start & pluginctl unbound_start & pluginctl unbound_start &
I've succeed to get several /var/unbound/dev mount point instances and eventually get a stuck unbound with a 100% CPU.

IMHO I think the unbound problem can be triggered by multiple concurrent restart too.
So the unbound start need to use a lock mechanism of similar to avoid several unbound starts because the launch of unbound can take time and so to only check if PID exists in order to launch the process is not enough.
Title: Re: Unbound crashing
Post by: lar.hed on November 27, 2023, 06:44:45 pm
After reading this thread I can only say I think I have this issue also. When my OPNsense installation hits 100% unbound in one (of eight) cores, there are multiple unbound running. I will se if I can find anything more usable on my side....
Title: Re: Unbound crashing
Post by: H3n on December 20, 2023, 11:18:22 am
Running into the same issue.
Our current workaround is that we do have a scheduled reboot each night, hoping that we resolve the issue.

We notice that the # of process rise as soon as the error message pops up
(http://process_spike.png)


Is it possible to set `so-reuseport: no` via GUI?
Title: Re: Unbound crashing
Post by: rene_ on December 21, 2023, 08:51:04 am
On my master/slave opnsense setup with a configuration synchronisation per minute (cron command: HA update and reconfigure backup) I've tried to debug further:

Do not do this.
Each config sync will restart the services on the slave firewalls, e.g. an ntp service will never finish its synchronisation and so on.
This will cause more trouble than it is worth.
Increase the interval to at least one hour.
Title: Re: Unbound crashing
Post by: doktornotor on December 21, 2023, 09:02:47 am
Could someone explain to me what's the huge advantage of the HA DNS setup when it's causing nothing but trouble, while pretty much the same result can be achieved by simply pointing clients to multiple DNS servers? Certainly must be missing something here.
Title: Re: Unbound crashing
Post by: zentoo on December 27, 2023, 04:53:17 pm
On my master/slave opnsense setup with a configuration synchronisation per minute (cron command: HA update and reconfigure backup) I've tried to debug further:

Do not do this.
Each config sync will restart the services on the slave firewalls, e.g. an ntp service will never finish its synchronisation and so on.
This will cause more trouble than it is worth.
Increase the interval to at least one hour.

I understood it with this unbound issue and so proceed to extend sync time.

IMHO the design of configuration synchronization is really not the good one.
It would be clever to restart only services that have their configuration modified by the synchronization like usual operating systems. It's really a problem for a system that is designed to provide high availability.

At each configuration sync, the master XML file need to be split for each service and compared to related split slave service configuration in order to only restart the service if its configuration have been modified.
It shouldn't be so hard to implement.
Title: Re: Unbound crashing
Post by: joshndroid on January 12, 2024, 07:57:56 am
Has anyone got a bit of a method to get this either bypassed, through a properly scheduled reboot or a patch working for this on latest?

It has been months of this issue for me, its suuuper random so even a daily scheduled reboot doesn't bypass it.

I am considering moving to a UniFi router as this is made my home network so unreliable it's really killing the family approval when i get calls/etc while im at work that the internet is down
Title: Re: Unbound crashing
Post by: lar.hed on January 12, 2024, 10:03:52 am
Well I still don't know if I have the same issue. Unbound does not like my setup, or as I think something in the OS level is giving Unbound crap in return. So maybe not Unbound issue. And as I wrote in my own long thread, I will most likely have to eat up this statement sooner or later. I'm fine with that... I just would like to solve this.

I use Monit to figure out when it hits 100% CPU on one core, then I have a script that does kill -9, and then restart. It is a band aid kind of solution - however it works.
Title: Re: Unbound crashing
Post by: joshndroid on January 12, 2024, 10:07:11 am
Can you share your monit setup/script so i can run something similar?
Title: Re: Unbound crashing
Post by: lar.hed on January 12, 2024, 10:14:15 am
Sure!

I have my Monit scrips in:

/usr/local/opnsense/scripts/OPNsense/Monit

For finding, for me, the 100% CPU bound Unbound process, I use this:

Code: [Select]
#!/bin/csh

set UnboundCPU=`ps auwwx | grep /usr/local/sbin/unbound | grep -v grep | awk '{print $3}' | awk -F. '{print $1}' | grep 100`

exit $UnboundCPU

For killing that process when the first Monit script above reacts, I use this for killing that process:

Code: [Select]
#!/bin/csh

pgrep "unbound" | grep -v "$$" | xargs kill -9

And the start is normal, so nothing special about that:
Code: [Select]
/usr/local/sbin/pluginctl -c unbound_start
I added a "Service test" under monit that tests the first scripts return, as "status > 90". The it is just normal setup of the rest under Monit.
Title: Re: Unbound crashing
Post by: joshndroid on January 12, 2024, 10:31:45 am
Thanks for sharing.
I am currently trying to set this up.

I have created the 2 scripts in the monit folder as necessary with the contents as below.
I added a wait for a couple second between the killing and the starting.

I am unsure how to setup the service test.
Am i executing the first script with the condition of the status > 90?

Title: Re: Unbound crashing
Post by: lar.hed on January 12, 2024, 11:23:10 am
I actually have TWO monit's to handle Unbound. It was by misstake we can say... So the first instance checks if Unbound is running or not, and has stop/start in the setup. The 2nd one has ONLY stop, since well I have autorestart due to the first one. They can easily be combined I guess. Although I think that the first one which does a normal stop, might be good to have around. The 2nd one, that does a kill -9, well it is the bandaid so maybe not that important. Anyway, when the 2nd one kills the Unbound process the first one will start Unbound since it is not running.... A strange way maybe to solve this, but it was by misstake I would say. It works anyway on my setup.

The thing here is that Unbound hitting >90 CPU (the first script returns the CPU usage for the Unbound process, as long as the return valye is below threshold it does not do anything. When the value goes above 90 it will fire the kill script (action=stop in my case).
Title: Re: Unbound crashing
Post by: joshndroid on January 13, 2024, 01:44:05 am
thanks for the replies.

At this time i believe i have the service test setup correctly (i have attached an image).
I just am unsure how to setup the monit service settings entry correctly. TBH i have never really understood how to setup monit properly. Do you have a screenshot of your monit service settings entry for the unbound killer/etc so i can set mine up correctly?
Title: Re: Unbound crashing
Post by: lar.hed on January 14, 2024, 01:33:17 pm
Sorry for a bit of late reply. I think you have entered correct information, no worries there.

And here is the attachment for the service. Do note that the "stop" word is there since the service requires (!) an argument - however it is not used. I guess I should re-write the csh script to take an argument so it gets a bit more flexible...

Title: Re: Unbound crashing
Post by: joshndroid on January 15, 2024, 09:05:42 am
Thanks again for the help.
I believe I now have everything setup as required, hopefully see less issues
Title: Re: Unbound crashing
Post by: joshndroid on January 17, 2024, 07:34:24 am
It would appear that our issue may not get any real love based on this reply....

might be the nail in the coffin for opnsense for me....
Its been a good few years

Title: Re: Unbound crashing
Post by: lar.hed on January 17, 2024, 09:59:11 am
Well as I wrote in my own little thread, there is an extra restart of Unbound that does not behave: https://forum.opnsense.org/index.php?topic=37840.msg186884#msg186884

I can manage thru the Monit script.

But I would love to solve this. It used to work pre 23.7 - now it well behaves not as good?
Title: Re: Unbound crashing
Post by: CJ on January 17, 2024, 04:55:53 pm
It would appear that our issue may not get any real love based on this reply....

might be the nail in the coffin for opnsense for me....
Its been a good few years

Did you test the patch?
Title: Re: Unbound crashing
Post by: lar.hed on January 17, 2024, 05:00:46 pm
Which one?

Quote
opnsense-patch 7406a5067f8
opnsense-patch a086f40b
opnsense-patch 845fbd384fe

One thing I would love to see though is a added delay for say 3 seconds for debugpurpose inside the command:
Code: [Select]
/usr/local/sbin/pluginctl -c unbound_start
This way, will an added delay, one might see if it is a collision between different Unbound processes (the stop process is not finished before the new started process is well running - so they colide). Patch anyone that knows how this start command works?
Title: Re: Unbound crashing
Post by: CJ on January 17, 2024, 05:22:13 pm
I don't know which patch.  I don't have this issue so I haven't been tracking all of the developments.  I just disagree with Josh's perception of things.  It appears the OPNsense team is attempting to fix the issue but aren't being provided enough information and testing support to be able to get it fixed.  Therefore anyone who has this problem should test the provided patches and provide feedback so the investigation can continue.
Title: Re: Unbound crashing
Post by: lar.hed on January 17, 2024, 05:25:34 pm
I understand. However there are more than likely +10 installations that has this issue.
Title: Re: Unbound crashing
Post by: lar.hed on January 18, 2024, 10:37:03 am
Anyone that has this issue with Unbound and 100% CPU on one core: May I ask if each and everyone of you could tell me (and everyone else) which CPU type / Bare metal / Virtualization you are running on? Reason: wonder if it could be a performance kind of thing that is part of this....

I'm on Intel i7-8550, 8 threads and 4 cores (yea I know I say 8 cores all the time - but that is another story...). Baremetal, 16GB.

Edit: And also, let me know if any of the interfaces has a direct connection to the OPNsense, for example a PC connected direct to LAN interface (the one used for setup for example) without any switch or anything between?
Title: Re: Unbound crashing
Post by: jefeman on January 21, 2024, 10:06:33 pm
I apologize in advance, this is a really long post. But I've tried to give as much config detail and test results as I can for @franco to work with :), and I'm willing to do some detailed troubleshooting if that's what it takes.

I am experiencing Unbound crashes as well. When Unbound crashes every 1-2 days, causing me to lose all DNS on my home network. I am able to restart Unbound from the Dashboard after logging into the GUI using the router's IP address. I do not recall seeing 100% CPU (but I will confirm next time it happens). The only error I receive is

Code: [Select]
2024-01-21T01:57:34 Notice kernel: <6>pid 10957 (unbound), jid 0, uid 59: exited on signal 11
because I don't yet have more logging enabled.

Hardware: Deciso DEC740 (AMD Ryzen Embedded V1500B, 4 cores / 8 threads; 4 GB RAM; 128 GB internal storage)
Firmware: Opnsense Business 23.10.1_2, Commit 23fed1bcf
Unbound version: 1.19.0

Connectivity Audit: Pass, looks normal
Health Audit: Pass, looks normal
Security Audit: 1 Problem found: openssl111-1.1.1w, OpenSSL -- DoS in DH generation, CVE-2023-5678

Plugins installed:
Code: [Select]
os-OPNBEcore 1.2_1
os-git-backup 1.0_3
os-mdns-repeater 1.1_1
os-wireguard 2.5_2
os-wol 2.4_2

I have used Opnsense Business for approximately two years, and I did not experience this issue until I upgraded from Opnsense Business 23.10 to Opnsense Business 23.10.1 approximately two weeks ago.

My use case isn't very complicated: I am using Unbound without DnsCrypt-proxy or Dnsmasq. The only mildly complicated part of my configuration is my several VLANs and the Mullvad VPN client on my client machine. This is a home network and there is exactly one user: me :)

In answer to @lar.hed's network architecture question, I have one Deciso DEC740 RJ-45 port connected to a Netgear GS105E switch; and a Ubiquiti Unifi AC-Lite AP, which is how I use the network day-to-day (that is, wirelessly). The only mildly complicated part is the one LAGG, 5 VLANs, and perhaps the DHCP Options 121 and 249. I'm not a network engineer so there might be a few bugs in this part of the Opnsense/switch/AP configuration. I can provide more details if needed, up to and including the OS image (privately).

I am a 20+ year Linux user, and my day job is engineering hardware, firmware, and software for embedded devices. I do not know my way around FreeBSD very well though, unfortunately, but I am willing to continue troubleshooting if @franco has any more requests or ideas.

I have read through this thread in its entirety, but I have not yet attempted the patches 7406a50, a086f40, and 845fbd3. I will try each of them, but in the meantime, here is what I have examined:

I have looked through the logs, and I do not think my Unbound is restarting upon receiving a new DHCP lease. The logs in /var/log/system/ suggest DHCP renews from my ISP every 12 hours. I am not totally positive though.

I have hashed the root hints files, as @franco suggested previously:

Code: [Select]
root@OPNsense:~ # shasum -a 256 /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints /root/named.root
a003be56acb66b2c9f77fb4685919bba36094f631b8b2f9bb6599220ebe31219  /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints
a003be56acb66b2c9f77fb4685919bba36094f631b8b2f9bb6599220ebe31219  /var/unbound/root.hints
f91549a77840b2d306fd49ad01facda1f4d4de0795f9f60844d6aea87a156429  /root/named.root
Code: [Select]
root@OPNsense:~ # md5sum /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints /root/named.root
d090610a892c2e476d93042dc70dc393  /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints
d090610a892c2e476d93042dc70dc393  /var/unbound/root.hints
d22f17ab89749f32679cb1810d4b6109  /root/named.root

The root.min.hints file and the root.hints file match. However the /root/named.root I downloaded from https://www.internic.net/domain/named.root (https://www.internic.net/domain/named.root) does not match. Not only are the dates different, but also the IPv{4,6} addresses of the B server has changed:

Code: [Select]
root@OPNsense:~ # diff -u /var/unbound/root.hints /root/named.root
--- /var/unbound/root.hints 2024-01-21 17:16:17.563320000 +0000
+++ /root/named.root 2024-01-21 17:36:08.329604000 +0000
@@ -8,10 +8,10 @@
 ;           file                /domain/named.cache
 ;           on server           FTP.INTERNIC.NET
 ;       -OR-                    RS.INTERNIC.NET
+;
+;       last update:     December 20, 2023
+;       related version of root zone:     2023122001
 ;
-;       last update:     July 09, 2018
-;       related version of root zone:     2018070901
-;
 ; FORMERLY NS.INTERNIC.NET
 ;
 .                        3600000      NS    A.ROOT-SERVERS.NET.
@@ -21,8 +21,8 @@
 ; FORMERLY NS1.ISI.EDU
 ;
 .                        3600000      NS    B.ROOT-SERVERS.NET.
-B.ROOT-SERVERS.NET.      3600000      A     199.9.14.201
-B.ROOT-SERVERS.NET.      3600000      AAAA  2001:500:200::b
+B.ROOT-SERVERS.NET.      3600000      A     170.247.170.2
+B.ROOT-SERVERS.NET.      3600000      AAAA  2801:1b8:10::b
 ;
 ; FORMERLY C.PSI.NET
 ;

I am not knowledgeable enough about DNS to know: if the B server IP addresses are different or wrong, could that cause any sort of problem related to the possible parsing issue? Or will the protocol just use all the other root servers? I would also think that all Opnsense installations would be observing repeatable Unbound failures if this were the cause.

ICMP (ping) is responsive for all four IPv{4,6} addresses for the B server (the Unbound version and the authoritative version from InterNIC). I attempted DNS lookups direct to each, but apparently 'dig' and 'nslookup' are not installed on Opnsense, and I don't see packages for them :(

Finally some general Opnsense and FreeBSD questions:

- What is "DoT"?
- What is "so-reuseport"?
- How do I restart the Unbound service from the command line?
- Which CLI text editors are installed by default? I installed vim using 'pkg install vim', because I could not find any of my usual text editors.
- Which command-line tools are available to query DNS?

I'll try the three patches so far and report back. Please let me know if there is anything else I should try or any other information I should try to collect.

Thanks!
Title: Re: Unbound crashing
Post by: lar.hed on January 22, 2024, 10:33:06 am
Let's start with the questions in the end:

DoT => DNS over TLS - https://en.wikipedia.org/wiki/DNS_over_TLS

"so-reuseport" => I don't exactly know where this option is set in the GUI (or is it direct into unbound config file maybe?), but it has to do with parallel handling I think. it might give greater UDP performance. One can read abit about it at: https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html

- How do I restart the Unbound service from the command line?

Code: [Select]
/usr/local/sbin/pluginctl -c unbound_stop
/usr/local/sbin/pluginctl -c unbound_start

- Which CLI text editors are installed by default?

The only one I know, and the one I use, is vi - it is very basic and not for each and everyone, but it always seems to be on every Linux/Unix installation by default :-)


Then about your challenge, since you can restart from the GUI you are not hit by the issue some of us others are with high CPU load. Because when 100% CPU on one core, you have to kill unbound with kill -9. And you don't need that....

Do you have any Monit setup to auto-restart Monit? If not, do so and you have a band aid solution... See my attached screendump.

Regarding patches, I would only install:
Code: [Select]
opnsense-patch a086f40b
opnsense-patch 845fbd384fe

One last thing: I had both mDNS and UDP Broadcast Relay installed (and even one more in the same "group") up till recent and when I removed mDNS (and the third one which I can not even recall which it was) I actually got a lot more stable Unbound. I can not see why this has happened, but still it did. So maybe consider remove (as in remove completely, not just disable but drop it from plugins) mDNS and run only UDP Broadcast Relay?
Title: Re: Unbound crashing
Post by: joshndroid on January 23, 2024, 06:39:33 am
I don't know which patch.  I don't have this issue so I haven't been tracking all of the developments.  I just disagree with Josh's perception of things.  It appears the OPNsense team is attempting to fix the issue but aren't being provided enough information and testing support to be able to get it fixed.  Therefore anyone who has this problem should test the provided patches and provide feedback so the investigation can continue.

There appear to be others here who understand under the hood a lot more than I. I am unsure on what other logs are required, apart from the one within unbound? I can happily provide.


Anyone that has this issue with Unbound and 100% CPU on one core: May I ask if each and everyone of you could tell me (and everyone else) which CPU type / Bare metal / Virtualization you are running on? Reason: wonder if it could be a performance kind of thing that is part of this....

I'm on Intel i7-8550, 8 threads and 4 cores (yea I know I say 8 cores all the time - but that is another story...). Baremetal, 16GB.

Edit: And also, let me know if any of the interfaces has a direct connection to the OPNsense, for example a PC connected direct to LAN interface (the one used for setup for example) without any switch or anything between?

I was running on a AMD 2700, 16GB on an SSD... Plenty of horsepower.

I decided to try and reduce power consumption around my place so I have today moved to a Intel 8500T dell micro setup with an m.2 to intel ethernet setup, 16gb ram. Also may as well try a switch from AMD to Intel to see if that makes any difference at all.

LAN setup has always been from the router into a switch
Title: Re: Unbound crashing
Post by: jefeman on January 23, 2024, 06:48:24 am
@lar.hed: Thanks for the answers.

I checked and I have so-reuseport: yes in unbound.conf. Which is a sane default, I'll consider adjusting this after I've tried both patches.

But also after re-reading: earlier in the thread, @karlson2k had Unbound crashes with both "yes" and "no" settings. I took away that there was perhaps some small correlation to Unbound's rate of failure, but not enough to say with any confidence that this setting is adjacent to the cause (plus the small sample size).

I definitely did not check for vi (whoops!) but now I have vim so I am set :)

Since I don't see 100% CPU, perhaps that suggests multiple bugs with similar symptoms?

I do not have Monit set up to auto-restart DNS. I was unaware of the feature, and I might try it if the problem becomes worse, but for now I will just re-set it manually because that makes it easier for me to monitor. For my use case and symptoms, it's annoying, but not a huge impact... yet. And now that I'm trying to fix it, I actually WANT the problem to recur frequently so I can catch it in the act ;D

I downloaded the patch files to /root. Next time the problem occurs, I will apply a086f40 before re-starting Unbound.

Unbound is on UDP/53 and mDNS is UDP/5353 so I don't think there would be a conflict. I didn't know about the udp-broadcast-relay plugin though, I'll look at it and consider switching.

Hopefully my DNS will crash again soon...




Title: Re: Unbound crashing
Post by: jefeman on January 23, 2024, 09:12:08 am
There appear to be others here who understand under the hood a lot more than I. I am unsure on what other logs are required, apart from the one within unbound? I can happily provide.

Based on everything upthread, I think this is one of those frustrating bugs where system logs unfortunately don't help much. Cranking the log level up changes the code path so much that the bug doesn't happen anymore, or happens a lot less frequently :(

Have you tested the patches offered upthread? One of the best ways you can help troubleshoot is to apply them one at a time, or in various combinations. I'm not an OPNsense developer so I don't really know for sure, but from upthread I think the first patch was unsuccessful, so I recommend trying the second and third patches:

The second patch, reply # 46:
Here is the promised patch:

https://github.com/opnsense/core/commit/a086f40b

# opnsense-patch a086f40b


Cheers,
Franco

The third patch, reply # 83:
Quote
https://github.com/opnsense/core/commit/845fbd384fe

# opnsense-patch 845fbd384fe
This patch significantly changed the situation.
Unbound is not crashing anymore, while without this patch Unbound was crashing daily.
I'm testing it for several days. The settings were chosen to trigger crash as much as possible (no debugging logging, parallel threads).

Probably without this patch the file is created in parallel with normal Unbound startup.
With this patch the file is created always before the start of Unbound.

Even though the author reported a crash two weeks later (reply # 84), the patch still definitely made a difference.

You can download a patch directly onto your OPNsense device by using its GitHub URL and adding ".patch" to the end:

Code: [Select]
root@OPNsense:~ # cd /root
root@OPNsense:~ # /usr/local/bin/curl https://github.com/opnsense/core/commit/7406a5067f8.patch -o 7406a5067f8.patch
root@OPNsense:~ # /usr/local/bin/curl https://github.com/opnsense/core/commit/a086f40b.patch -o a086f40b.patch
root@OPNsense:~ # /usr/local/bin/curl https://github.com/opnsense/core/commit/845fbd384fe.patch -o 845fbd384fe.patch

To apply, use the patch command:

Code: [Select]
root@OPNsense:~ # /usr/bin/patch --dry-run --backup --directory /usr/local --strip 2 --unified --version-control numbered < 7406a5067f8.patch
root@OPNsense:~ # /usr/bin/patch --dry-run --backup --directory /usr/local --strip 2 --unified --version-control numbered < a086f40b.patch
root@OPNsense:~ # /usr/bin/patch --dry-run --backup --directory /usr/local --strip 2 --unified --version-control numbered < 845fbd384fe.patch

I intentionally added the --dry-run argument to prevent breakage from blindly copying and pasting things from the Internet ;) Only if the patch command succeeds after the dry run should you remove the --dry-run argument, which will write the changes to the file you are patching. A dry run will look something like:

Code: [Select]
root@OPNsense:~ # patch --dry-run --backup --directory /usr/local --strip 2 --unified --version-control numbered < 845fbd384fe.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|From 845fbd384fe564a8b436a5a6475952f90183c188 Mon Sep 17 00:00:00 2001
|From: Franco Fichtner <franco@opnsense.org>
|Date: Fri, 13 Oct 2023 12:54:09 +0200
|Subject: [PATCH] unbound: diagnose tool for strange unbound issue
|
|PR: https://forum.opnsense.org/index.php?topic=36425.0
|---
| src/etc/inc/plugins.inc.d/unbound.inc | 6 +++++-
| 1 file changed, 5 insertions(+), 1 deletion(-)
|
|diff --git a/src/etc/inc/plugins.inc.d/unbound.inc b/src/etc/inc/plugins.inc.d/unbound.inc
|index f74ba58e78b..0b77f131c13 100644
|--- a/src/etc/inc/plugins.inc.d/unbound.inc
|+++ b/src/etc/inc/plugins.inc.d/unbound.inc
--------------------------
Patching file etc/inc/plugins.inc.d/unbound.inc using Plan A...
Hunk #1 succeeded at 143.
Hunk #2 succeeded at 287.
done

I decided to try and reduce power consumption around my place so I have today moved to a Intel 8500T dell micro setup with an m.2 to intel ethernet setup, 16gb ram. Also may as well try a switch from AMD to Intel to see if that makes any difference at all.

I would be really, really surprised if this were a CPU-related problem. In my experience, this kind of issue will be in upstream Unbound, OPNsense's patches to the upstream, or the FreeBSD kernel... or even some interaction between more than one of these!
Title: Re: Unbound crashing
Post by: lar.hed on January 23, 2024, 10:44:08 am
I need to be more precis I think...

So, my current setup is OPNsense 23.7.11-amd64.

On this I have the two patches earlier referenced:
Code: [Select]
opnsense-patch a086f40b
opnsense-patch 845fbd384fe

The I have removed a two plugins: mDNS and IGMP Proxy - and is only running UDP Broadcast Relay: https://forum.opnsense.org/index.php?topic=38114.0

Also, since in my case there seem to be some kind of connection to IP adress changes or something I decided to uncheck "Register DHCP Leases" and "Register DHCP Static Mappings".

So in all 6 changes. I can not say that each change has anything to do with this challenge I have with Unbound, however, the changes above has made Unbound stable from 100% CPU Bound. Which one I would vote for? Patches all day long....

I have had one Unbound stop which I have no reference to why. Monit restarted Unbound directly and since I'm not at home where the OPNsense is installed, I have not been able to check anything....
Title: Re: Unbound crashing
Post by: jefeman on January 28, 2024, 07:58:52 pm
Unbound isn't crashing as often as I had seen before (a week passed without a crash)... but I did see a crash yesterday.

I installed FreeBSD ports and compiled and installed Unbound without stripping debug symbols. I also configured the kernel to write a core file at the next crash, so hopefully the crashes continue with the un-stripped Unbound :)

Two other things of note:

Title: Re: Unbound crashing
Post by: joshndroid on January 28, 2024, 11:18:07 pm
snip

Thanks for the thorough guide for installing the patches.
I had left it for a few days after migrating the hardware to see what would happen.
I had 1 definite show stopper style unbound crash as normal however it would appear that the service didn't restart on that one... looking at the logs I have had a couple others but its possible the monit workaround has helped in getting it to restart as it didn't appear to stop resolving on those times.

I jumped in on applying all 3 patches. Will clear the logs and see what pops up over the next few days
Title: Re: Unbound crashing
Post by: lar.hed on January 30, 2024, 05:24:48 pm
It only took 18 days or so, and now I have 100% CPU on Unbound on one core. So the changes I have done has made it more stable kind-of, but still does NOT solve the issue.
Title: Re: Unbound crashing
Post by: joshndroid on February 08, 2024, 11:05:22 am
Can also confirm that after a while I got the root.stubs issue again and unbound stopped resolving.
It seemed to have reduced the frequency of the issue, but not completely.
Title: Re: Unbound crashing
Post by: jefeman on February 09, 2024, 05:30:52 am
I installed Unbound from FreeBSD ports with debug symbols. But unfortunately (for me) Unbound hasn't crashed since January 27.

The only change I made (on January 28) is installing from source. So assuming I compiled the same way OPNsense does, eventually I'll get a crash and a core dump.

I think I saw upthread someone asking for the debug symbols that go with the OPNsense Unbound binaries. Does anyone more familiar with FreeBSD know if they are available for download somewhere?
Title: Re: Unbound crashing
Post by: joshndroid on February 12, 2024, 06:21:30 am
So as i was still seeing this happen on 23.7 after applying the patches (it was better just not as bad) I decided to give 24.1 a run and see if anything else had changed. I have been running it for the last few days.

I can't even get through a single day without multiple reboots due to instances of this.

So i am on 24.1 and have just re-run the patches and see how it goes from there.
Title: Re: Unbound crashing
Post by: joshndroid on February 13, 2024, 06:45:35 am
So since re-doing the patches (and a minor update)
I have seen 2 x entries in the log of errors, however I have not actually seen any issues from a usability standpoint.

I have seen none for coming up to almost 20hours.