Unbound won't start from GUI after 21.7.1 upgrade (from 21.7)

Started by Wendo, August 06, 2021, 11:05:53 PM

Previous topic - Next topic
I just upgraded to 21.7.1 and Unbound now won't start from the GUI. However, going to the shell and running

unbound -c /var/unbound/unbound.conf

make it start no problem. There is nothing in the logs indicating any sort of issue on both the failed starts and the starts from the command line.

Can anyone point me in the right direction here? Even putting a -d on the command line doesn't show me any errors on start (probably because it starts successfully).

What command line does opnsense actually use to start Unbound?

Thanks

Does this help?


ps aux | grep unbound
root     10938   0.0  0.2    24220   14584  -  Ss   22:09     0:04.29 /usr/local/bin/python3 /usr/local/opnsense/scripts/dns/unbound_dhcpd.py --domain home.arpa (python3.8)
unbound  16367   0.0  0.5    82980   41912  -  Is   22:09     0:03.05 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
Running OPNsense through Proxmox
4 x Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (1 Socket)
24 GB RAM


Ok, so no idea what was going on. I poked around with a couple of things (one of which was I was using the wrong config file when running it from command line. Then started getting can't bind, port in use even though it wasn't running (and neither was dnsmasq) and just said screw it and restarted (I'd done that before).

Anyway, after the restart it came up just fine *shrug*

Quote from: Wendo on August 06, 2021, 11:05:53 PM
I just upgraded to 21.7.1 and Unbound now won't start from the GUI. However, going to the shell and running

unbound -c /var/unbound/unbound.conf

make it start no problem. There is nothing in the logs indicating any sort of issue on both the failed starts and the starts from the command line.

I observed exactly the same problem after updating form 21.7 to 21.7.1. The config (checked with unbound-checkconf) and log files look ok. Even after a second reboot since the update I have to start the unbound process manually via the concole  :'(.

Any hints?

BTW the whole startup process of my opnsense takes very long time (up to 15 minutes). But this is an issue which I have observed long time ago and unfortunately have not investigated yet.
OPNsense 24.7.11_2-amd64

I have noticed too, that a startup do take a very long time. Atleast, that is what I thought. However, I am able to SSH in to the firewall at a very early stage.

But to not wait for the web management to show up, I have often found myself executing

/usr/local/etc/rc.restart_webgui


And then everything looks fine.
Running OPNsense through Proxmox
4 x Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (1 Socket)
24 GB RAM

Have you checked if the system time is correct, and ntp sync is working properly? In another thread someone reported time deviations leading to instability.
In theory there is no difference between theory and practice. In practice there is.

Quote from: dinguz on August 07, 2021, 03:53:45 PM
Have you checked if the system time is correct, and ntp sync is working properly? In another thread someone reported time deviations leading to instability.

Yes, both is fine. Maybe, some startup scripts weren't finished. I'll try to investigaste more, why the whole boot process takes long time.
OPNsense 24.7.11_2-amd64

Same issue here - unbound would not start automatically and the GUI button did not start it either.  Working configuration imported from 21.1 and unbound-checkconf does not complain.

The GUI "start" button only generated the following unbound log entry:
2021-08-08T13:16:31 unbound[57542] daemonize unbound dhcpd watcher.

whereas running unbound -c /var/unbound/unbound.conf
caused Unbound to start up fine.

Reboot did  not solve the issue:

Initially, Unbound started up - GUI showed it as running and log entries seem to confirm startup, but then it died again:


2021-08-08T13:28:07 unbound[2474] daemonize unbound dhcpd watcher.
2021-08-08T13:28:05 unbound[2663] daemonize unbound dhcpd watcher.
2021-08-08T13:28:00 unbound[38473] daemonize unbound dhcpd watcher.
2021-08-08T13:27:52 unbound[60119] daemonize unbound dhcpd watcher.
2021-08-08T13:27:52 unbound[25826] [25826:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:52 unbound[25826] [25826:0] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:52 unbound[25826] [25826:0] info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:52 unbound[25826] [25826:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:52 unbound[25826] [25826:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:52 unbound[25826] [25826:0] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:52 unbound[25826] [25826:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:52 unbound[25826] [25826:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:52 unbound[25826] [25826:0] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:52 unbound[25826] [25826:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:52 unbound[25826] [25826:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:52 unbound[25826] [25826:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:51 unbound[25826] [25826:0] info: service stopped (unbound 1.13.1).
2021-08-08T13:27:51 unbound[25826] [25826:0] info: control cmd: dump_cache
2021-08-08T13:27:45 unbound[25826] [25826:0] info: start of service (unbound 1.13.1).
2021-08-08T13:27:45 unbound[25826] [25826:0] notice: init module 2: iterator
2021-08-08T13:27:45 unbound[25826] [25826:0] notice: init module 1: validator
2021-08-08T13:27:45 unbound[25826] [25826:0] notice: init module 0: dns64
2021-08-08T13:27:39 unbound[25826] [25826:0] notice: Restart of unbound 1.13.1.
2021-08-08T13:27:39 unbound[25826] [25826:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:39 unbound[25826] [25826:0] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:39 unbound[25826] [25826:0] info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:39 unbound[25826] [25826:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:39 unbound[25826] [25826:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:39 unbound[25826] [25826:0] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:39 unbound[25826] [25826:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:39 unbound[25826] [25826:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:39 unbound[25826] [25826:0] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:39 unbound[25826] [25826:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:39 unbound[25826] [25826:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:39 unbound[25826] [25826:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:38 unbound[25826] [25826:0] info: service stopped (unbound 1.13.1).
2021-08-08T13:27:38 unbound[25826] [25826:0] info: start of service (unbound 1.13.1).
2021-08-08T13:27:38 unbound[25826] [25826:0] notice: init module 2: iterator
2021-08-08T13:27:38 unbound[25826] [25826:0] notice: init module 1: validator
2021-08-08T13:27:38 unbound[25826] [25826:0] notice: init module 0: dns64
2021-08-08T13:27:31 unbound[66648] daemonize unbound dhcpd watcher.
2021-08-08T13:27:31 unbound[2967] [2967:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:31 unbound[2967] [2967:0] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:31 unbound[2967] [2967:0] info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:31 unbound[2967] [2967:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:31 unbound[2967] [2967:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:31 unbound[2967] [2967:0] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:31 unbound[2967] [2967:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:31 unbound[2967] [2967:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:31 unbound[2967] [2967:0] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:31 unbound[2967] [2967:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
2021-08-08T13:27:31 unbound[2967] [2967:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
2021-08-08T13:27:31 unbound[2967] [2967:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2021-08-08T13:27:31 unbound[2967] [2967:0] info: service stopped (unbound 1.13.1).
2021-08-08T13:27:31 unbound[2967] [2967:0] info: control cmd: dump_cache
2021-08-08T13:27:19 unbound[2967] [2967:0] info: control cmd: list_local_data
2021-08-08T13:27:19 unbound[2967] [2967:0] info: start of service (unbound 1.13.1).
2021-08-08T13:27:19 unbound[2967] [2967:0] notice: init module 2: iterator
2021-08-08T13:27:19 unbound[2967] [2967:0] notice: init module 1: validator
2021-08-08T13:27:19 unbound[2967] [2967:0] notice: init module 0: dns64
2021-08-08T13:27:12 unbound[87717] daemonize unbound dhcpd watcher.
[*** REBOOT ***]


This recurs several times in the log, thereafter it is only the unbound[57542] daemonize unbound dhcpd watcher.

Manually starting unbound works.

Could this be an issue with Unbound having issues with running on a different port than 53 (I have adguard home running as general DNS server), even though Unbound does not complain about ports?


For me the issue was ' Enable DNS64 Support' was enabled.

Hey,

this issue with Unbound still persists in OPNsense 21.7.3_3.

Unound runs briefly and then stops/crashes after a few secs.

As somova pointed out on Github it looks like the problem is that the cache-load command of unbound hangs.

Any solution?

On Github somova has suspected that the problem could be related to the state reset for dial-up connections (which kills all network connections incl. loopback connections).

Because of this, I ran a few tests.

  • Restart of OPNsense with WAN being connected:
    unbound hangs on cache-load command
  • Restart of OPNsense without WAN being connected:
    unbound properly starts up
  • Reconnecting WAN:
    state reset kicks in, unbound restarts and hangs on cache-load command

Now disabling "Dynamic state reset" under Firewall -> Settings -> Advanced...

  • Restart of OPNsense with WAN being connected:
    unbound properly starts up

So, it could well be that somova is right in his assumption.

@glasi
sounds logical imho
can you please test one possible solution?
1.install pacth:
opnsense-patch -a kulikov-a fc86769
(it just makes the system rule for the loopback interface stateless)
2.restart pf
3.try to reproduce the problem

thanks!

fc86769 looks reasonable if it solves the issue, nice


Cheers,
Franco