Unbound crashing

Started by seed, August 22, 2023, 08:18:48 AM

Previous topic - Next topic
Sometimes unbound is crashing and the whole device gets unresponsive


2023-08-22T04:14:01   Critical   unbound   [85028:3] fatal error: Could not initialize thread   
2023-08-22T04:14:01   Error   unbound   [85028:3] error: Could not set root or stub hints
2023-08-22T04:14:01   Error   unbound   [85028:3] error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type   
           TypeError: an integer is required (got type NoneType)   
           os.write(self._pipe_fd, res.encode())   
           File "dnsbl_module.py", line 226, in log_entry   
           mod_env['logger'].log_entry(   
           File "dnsbl_module.py", line 378, in cache_cb   
           logger.close()   
           File "dnsbl_module.py", line 443, in deinit
i want all services to run with wirespeed and therefore run this dedicated hardware configuration:

AMD Ryzen 7 9700x
ASUS Pro B650M-CT-CSM
64GB DDR5 ECC (2x KSM56E46BD8KM-32HA)
Intel XL710-BM1
Intel i350-T4
2x SSD with ZFS mirror
PiKVM for remote maintenance

private user, no business use

Can you post your Unbound config?  Are you using any custom blocklists?

i have no blocklist activated. No DNS blocking

my config from configctl
host overrides redacted.


   <unboundplus version="1.0.6">
      <general>
        <enabled>1</enabled>
        <port>53</port>
        <stats>1</stats>
        <active_interface/>
        <dnssec>1</dnssec>
        <dns64>0</dns64>
        <dns64prefix>64:ff9b::/96</dns64prefix>
        <noarecords>0</noarecords>
        <regdhcp>0</regdhcp>
        <regdhcpdomain/>
        <regdhcpstatic>0</regdhcpstatic>
        <noreglladdr6>1</noreglladdr6>
        <noregrecords>0</noregrecords>
        <txtsupport>0</txtsupport>
        <cacheflush>1</cacheflush>
        <local_zone_type>transparent</local_zone_type>
        <outgoing_interface/>
        <enable_wpad>0</enable_wpad>
      </general>
      <advanced>
        <hideidentity>1</hideidentity>
        <hideversion>1</hideversion>
        <prefetch>0</prefetch>
        <prefetchkey>0</prefetchkey>
        <dnssecstripped>1</dnssecstripped>
        <serveexpired>0</serveexpired>
        <serveexpiredreplyttl/>
        <serveexpiredttl/>
        <serveexpiredttlreset>0</serveexpiredttlreset>
        <serveexpiredclienttimeout/>
        <qnameminstrict>0</qnameminstrict>
        <extendedstatistics>1</extendedstatistics>
        <logqueries>1</logqueries>
        <logreplies>0</logreplies>
        <logtagqueryreply>0</logtagqueryreply>
        <logservfail>0</logservfail>
        <loglocalactions>0</loglocalactions>
        <logverbosity>1</logverbosity>
        <valloglevel>0</valloglevel>
        <privatedomain/>
        <privateaddress>0.0.0.0/8,10.0.0.0/8,100.64.0.0/10,169.254.0.0/16,172.16.0.0/12,192.0.2.0/24,192.168.0.0/16,198.18.0.0/15,198.51.100.0/24,203.0.113.0/24,233.252.0.0/24,::1/128,2001:db8::/32,fc00::/8,fd00::/8,fe80::/10</privateaddress>
        <insecuredomain/>
        <msgcachesize>100m</msgcachesize>
        <rrsetcachesize>200m</rrsetcachesize>
        <outgoingnumtcp>10</outgoingnumtcp>
        <incomingnumtcp>10</incomingnumtcp>
        <numqueriesperthread>4096</numqueriesperthread>
        <outgoingrange>8192</outgoingrange>
        <jostletimeout>200</jostletimeout>
        <cachemaxttl/>
        <cachemaxnegativettl/>
        <cacheminttl/>
        <infrahostttl>900</infrahostttl>
        <infrakeepprobing>0</infrakeepprobing>
        <infracachenumhosts>50000</infracachenumhosts>
        <unwantedreplythreshold>10000000</unwantedreplythreshold>
      </advanced>
      <acls>
        <default_action>allow</default_action>
      </acls>
      <dnsbl>
        <enabled>0</enabled>
        <safesearch>0</safesearch>
        <type>atf,aa,ag,bla0,bla,blf,blg,blp,blr,blr0,bls,blt,blt1,ep</type>
        <lists/>        <whitelists>*.redacted.tld,*.redacted.tld,*.redacted.tld,*.redacted.internal.tld</whitelists>
        <blocklists/>
        <wildcards/>
        <address/>
        <nxdomain>0</nxdomain>
      </dnsbl>
      <forwarding>
        <enabled>0</enabled>
      </forwarding>
i want all services to run with wirespeed and therefore run this dedicated hardware configuration:

AMD Ryzen 7 9700x
ASUS Pro B650M-CT-CSM
64GB DDR5 ECC (2x KSM56E46BD8KM-32HA)
Intel XL710-BM1
Intel i350-T4
2x SSD with ZFS mirror
PiKVM for remote maintenance

private user, no business use

Odd.  What version are you running?  23.7?

Why do you have dnsbl items configured if you're not using it?

Not sure that it is related, but I have problems with Unbound as well.
In my case ubound process starts eating 100% of CPU (as reported by top command).
Neither pluginctl nor web-interface can restart or stop the unbound daemon. When tried from web-interface it freezes for a minute (or so) and ends with nothing.
kill command can stop unbound only if used as "kill -9" (or "kill -kill").
This is pretty annoying.

I've removed "so-reuseport: no" from custom config to see whether it could fix anything. Unbound is running again with single thread only.

Problem started after upgrading from 23.1.x

I reported first here: https://forum.opnsense.org/index.php?topic=35475.0
Another similar report is here: https://forum.opnsense.org/index.php?topic=35523.0

Quote from: CJ on August 23, 2023, 01:52:00 PM
Odd.  What version are you running?  23.7?

Why do you have dnsbl items configured if you're not using it?

Running the latest 23.7.1_3 always updating to the latest version within 7 days.

DNSbl was configured buit caused probles so its disabled.


No idea why unbound is crashing once every few months?
i want all services to run with wirespeed and therefore run this dedicated hardware configuration:

AMD Ryzen 7 9700x
ASUS Pro B650M-CT-CSM
64GB DDR5 ECC (2x KSM56E46BD8KM-32HA)
Intel XL710-BM1
Intel i350-T4
2x SSD with ZFS mirror
PiKVM for remote maintenance

private user, no business use

In my case it's crashing once per day.


Thanks for the report. Can you try this patch? https://github.com/opnsense/core/commit/7406a5067f8

# opnsense-patch 7406a5067f8


Cheers,
Franco

Thank you.

Applied the patch and restarted the service. DNS still works. Since i dont know how to trigger the bug i hope its fixed.
i want all services to run with wirespeed and therefore run this dedicated hardware configuration:

AMD Ryzen 7 9700x
ASUS Pro B650M-CT-CSM
64GB DDR5 ECC (2x KSM56E46BD8KM-32HA)
Intel XL710-BM1
Intel i350-T4
2x SSD with ZFS mirror
PiKVM for remote maintenance

private user, no business use

If it should crash again posting the error helps a lot to take another look.


Cheers,
Franco

Applied the patch.
Tried to restart the Unbound from GUI, got in log
/usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '4507''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 4507: No such process'
Unbound stopped. Had to start it manually.
Manual start was successful.

The error only tells us Unbound was already stopped when it was attempted to be stopped. Hardly a fatal issue here ;)


Cheers,
Franco

So good news, bad news.  The bad news is that unbound is still failing after roughly 30 minutes.  The good news is that the patch allows me to restart the service without rebooting.

it might help sending the debug log entry from the crash
i want all services to run with wirespeed and therefore run this dedicated hardware configuration:

AMD Ryzen 7 9700x
ASUS Pro B650M-CT-CSM
64GB DDR5 ECC (2x KSM56E46BD8KM-32HA)
Intel XL710-BM1
Intel i350-T4
2x SSD with ZFS mirror
PiKVM for remote maintenance

private user, no business use