Unbound errors in log file

Started by lar.hed, December 31, 2023, 12:28:43 PM

Previous topic - Next topic
I still think the best path forward is to try with a clean install and no config, or at the very least only a partial config restore.  All of the different moving parts makes things like this harder to troubleshoot.

2nd day in a row: 19:07 - Unbound 100%, and it is the same as yeasterday:

2024-01-17T19:07:32 Informational unbound [58736:0] info: service stopped (unbound 1.19.0).

However that PID is not the expected one... Because the PID that got kill -9 was 59852, and the PID before that was 22451 - and the PID that is now running after kill -9 is 85003. 4 Different Unbound processes.

The thing that bothers me is just that 22451 might very well have been running at around 19.00 and then for some reason Unbound decides to start another process, 59852 - and we still have the 58736 PID....

Why that is I do not know. What is Unbound downloading everyday?

Add an delay on startup as suggested before might very well solve this.

And another thing, this is from the Unbound log yesterday:
2024-01-16T19:32:02 Informational unbound [63234:7] info: generate keytag query _ta-4f66. NULL IN
2024-01-16T19:32:02 Notice unbound daemonize unbound dhcpd watcher.
2024-01-16T19:32:01 Informational unbound [63234:0] info: start of service (unbound 1.19.0).
2024-01-16T19:32:01 Notice unbound [63234:0] notice: init module 2: iterator
2024-01-16T19:32:01 Notice unbound [63234:0] notice: init module 1: validator
2024-01-16T19:32:01 Notice unbound [63234:0] notice: init module 0: python
2024-01-16T19:26:48 Informational unbound [79537:3] info: generate keytag query _ta-4f66. NULL IN
2024-01-16T19:26:47 Notice unbound daemonize unbound dhcpd watcher.
2024-01-16T19:26:47 Critical unbound [79537:1] fatal error: Could not initialize thread
2024-01-16T19:26:47 Informational unbound [79537:0] info: start of service (unbound 1.19.0).


And this is from the "General" Log under system:

2024-01-16T19:32:01 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '79537''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 79537: No such process'

It is not strange that the kill does not work since the correct PID has changed - this got to be an error, the last startup has not updated the PID in time before getting killed, or trying to get killed...

Something is going on - let's see if I can monitor this closer tomorrow - it seems to do something at 19:00 (local time) so there is some sort of trigger starting all this.....

Regarding yesterdays 100% CPU, I did dig around in the System -> General log to see if I could find anything usefull. After all, what is triggering the restart of Unbound, right?

2024-01-17T19:10:53 Notice dhclient Creating resolv.conf
2024-01-17T19:09:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-01-17T19:09:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : dnsmasq_configure_do())
2024-01-17T19:09:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns ()
2024-01-17T19:09:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())
2024-01-17T19:09:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp ()
2024-01-17T19:09:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (execute task : ipsec_configure_do(,opt2))
2024-01-17T19:09:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (,opt2)
2024-01-17T19:09:42 Notice opnsense /usr/local/etc/rc.linkup: ROUTING: entering configure using 'opt2'
2024-01-17T19:09:42 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for opt2(igb2)
2024-01-17T19:09:42 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt2(igb2)
2024-01-17T19:09:42 Error configctl error in configd communication Traceback (most recent call last): File "/usr/local/sbin/configctl", line 66, in exec_config_cmd line = sock.recv(65536).decode() socket.timeout: timed out
2024-01-17T19:09:28 Notice kernel <6>igb2: link state changed to UP
2024-01-17T19:07:48 Notice kernel <6>igb2: link state changed to DOWN
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : dnsmasq_configure_do())
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns ()
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp ()
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (execute task : ipsec_configure_do(,opt2))
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (,opt2)
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: ROUTING: entering configure using 'opt2'
2024-01-17T19:07:42 Notice kernel <6>igb2: link state changed to UP
2024-01-17T19:07:42 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for opt2(igb2)


It seems to be rc.linkup that triggers the restart(s)...

Normal. What you see there are the routines called to setup services needed on an interface when it comes up.
You have this pc that you dedicate an interface to on your firewall , so when it wakes up or boot up, it "wakes" the interface (or so was my understaning, apologies if that's incorrect). Not saying it should bring anything down i.e. "break things" when you bring an interface up but the messages and config of unbound is what I'd expect.

Yes exactly - it is an "IP renew event", triggered by the interface.

Just as karlsson2k in the Unbound crashing thread: https://forum.opnsense.org/index.php?topic=35527.0

Although karlsson2k had WAN IP refresh, in my case I have PC dedicated to that interface (always the same IP address).

So IP is refreshed/renewed, and the:
2024-01-17T19:09:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
Does the restart of Unbound, and if one is lucky, this works - well most of the time anyway.

I do not think that OPNsense on smaller / slower boxes has this issue, but what do I know....?

What is the "Network Interfaces" option for Unbound set to?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on January 18, 2024, 02:58:52 PM
What is the "Network Interfaces" option for Unbound set to?

Default = "All (recommended)"

Quote from: lar.hed on January 18, 2024, 03:13:06 PM
Quote from: Patrick M. Hausen on January 18, 2024, 02:58:52 PM
What is the "Network Interfaces" option for Unbound set to?

Default = "All (recommended)"

Dang!  ;)
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Excellent - tell me more!

Or: It is the default, what should I replace that with - what and why?

Sorry, the other way round. "Dang, that's correct and not a possible source of problems."

When you explicitly set it to certain interfaces only and these interfaces go down, problems arise.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Ah, no worries then  ;D

I'll keep looking around, I still have a few odd ideas what to look for.... I'll will give up when I am out of ideas... Maybe...

January 18, 2024, 04:20:27 PM #72 Last Edit: January 18, 2024, 04:30:01 PM by cookiemonster
Quote from: lar.hed on January 18, 2024, 02:43:13 PM
Yes exactly - it is an "IP renew event", triggered by the interface.

Just as karlsson2k in the Unbound crashing thread: https://forum.opnsense.org/index.php?topic=35527.0

Although karlsson2k had WAN IP refresh, in my case I have PC dedicated to that interface (always the same IP address).

So IP is refreshed/renewed, and the:
2024-01-17T19:09:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
Does the restart of Unbound, and if one is lucky, this works - well most of the time anyway.

I do not think that OPNsense on smaller / slower boxes has this issue, but what do I know....?
Not quite the same.
Your particular case has one client on that interface, most others will have a switch on it with many clients instead, each will need a new IP lease.
The interface state is "down" when your PC is off because it is the only one and hence, will make the interface be down or up, triggering this 'interface up', which contains "unbound_configure". As I said it shouldn't break things but trying to point out that is not an ip address renewal triggering it.

Still thinking a basic setup first is the way to diagnose....

I appreciate your help!!

Sorry that I am stubborn then, but the interface goes up, down and then again up within seconds. I do not turn my PC on/off/on that fast - I would love to be able to do just that, but Windows does not seem to like that.... ???

And let's not forget that this exact config worked pre 23.7.
I have not changed anything inside Unbound or interfaces, not even a tiny bit - it is the same as before.

I have added two lines in my kill script. Copy root.hints and then fstat -u unbound. Anything else I could add that will be executed by automagic (Monit) if Unbound seems to hung again, anything that may or may not help out?