Unbound crashing

Started by seed, August 22, 2023, 08:18:48 AM

Previous topic - Next topic
I just love a "me too" without an error log attached and ignoring the last post on how to debug this further. ;)


Back on topic, did a quick check on 3 FWs, nothing to report

root@OPNsense:~ # diff -u /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints
root@OPNsense:~ #

Yes, I know logs would be good.
But is hard given that
1. DNSmasq is in use due to Unbound's issues
2. I would need to capture a crash, which is not always predictable or reproducible.

I will give it a try. Would logs at "Error" level suffice?

Why is everyone ignoring my post? ;) I don't really care about the logs. They don't tell us what the error is supposed to be but not how it's triggered and why it's persistent.

https://forum.opnsense.org/index.php?topic=35527.msg176361#msg176361

Ok did the diff against /var/unbound/root.hints, no output.

I will be switching back over to unbound. In the event of a crash what would be needed (captures, logs, files, command line etc.) to diagnose the issue?

Quote from: franco on September 28, 2023, 07:36:07 PM

If not it would be helpful to diff:

# diff -u /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints
I'm still experiencing the same issue. Either error: reading root hints /root.hints 2:12: Syntax error, could not parse the RR's type or error: reading root hints /root.hints 28:37: Syntax error, could not parse the RR's class.
# diff -q /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints && echo 'The files are identical.'
The files are identical.

I wonder if you have a corrupt there even if it doesn't look like that...

Try removing the file and restarting Unbound.

service unbound stop && rm -v /var/unbound/root.hints & cp -v /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints && service unbound onestart

No need.
Unfound could be stopped by kill -9. Then manual service restart works fine again.


Quote from: karlson2k on October 03, 2023, 09:13:36 PM
One more report:
https://forum.opnsense.org/index.php?topic=36270.0

Have the same problem, unbound crashes sporadically with the same log entry. In addition, the CPU load is extremely increased at this moment compared to normal.

A few observations:
* On OPNsense 23.7.x both Unbound versions (1.17.1 and 1.18.0) have the same problem
* OPNsense 23.1.x has the Unbound version 1.17.1

Therefore possible reasons:
* Some changes in OPNsense 23.7 broke the Unbound startup (like the daemon is started while files are being copied still)
* Some patches added in OPNsense 23.7 for Unbound broke the things (I'm not sure whether any patches were added)
* Some changes in FreeBSD kernel (like ASLR) broke badly designed Unbound processing

As log levels 3 and 4 somehow workaround the problem (while keep hammering the SSD), most likely the problem is caused by parralel statup processing (either OPNsense initialisation scripts or Unbound itself). I think detailed logs just slow down the startup so parallel processes have enough time to complete.

Quote from: newsense on October 03, 2023, 04:14:43 AM
I wonder if you have a corrupt there even if it doesn't look like that...

Try removing the file and restarting Unbound.

service unbound stop && rm -v /var/unbound/root.hints & cp -v /usr/local/opnsense/service/templates/OPNsense/Unbound/core/root.min.hints /var/unbound/root.hints && service unbound onestart

OK - unbound has been crashing - executed the above - will monitor.

Quote from: nerf on October 10, 2023, 09:14:25 AM
OK - unbound has been crashing - executed the above - will monitor.

This will not help.

If Unbound is started successfully, it will continue to work fine until restarted.
When OPNsense (re-)starts Unbound, the script re-creates '/var/unbound/root.hints' automatically.

Yeah yeah, but is the file modified or does unbound throw a spurious error while the contents of the file is ok? Because if the file is ok it's something very nasty inside unbound and that would be my guess all along.


Cheers,
Franco