Print Page - BIND (named) hanging in unresponsive state

Title: BIND (named) hanging in unresponsive state
Post by: Patrick M. Hausen on March 25, 2024, 11:30:53 AM

Hi all,

I am using BIND instead of Unbound in most of my deployments. Recently the process seems to become unresponsive for no obvious reason every other day or so.

When I check the state on the firewall it looks like this:

Code Select

root@opnsense:~ # ps awwux|grep named
root     4974   0.0  0.0   13488   3236  -  I    11:09       0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
root    15735   0.0  0.0   13488   3244  -  I    11:09       0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root    35956   0.0  0.0   13488   3236  -  I    11:11       0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
root    48098   0.0  0.0   13488   3244  -  I    11:11       0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root    51230   0.0  0.0   13488   3228  -  I    11:13       0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
root    51746   0.0  0.0   13488   3232  -  I    11:15       0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
bind    53253   0.0  0.4  106704  33780  -  Ss   20:26       2:06.97 /usr/local/sbin/named -u bind -c /usr/local/etc/namedb/named.conf
root    61439   0.0  0.0   13488   3236  -  I    11:13       0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root    61879   0.0  0.0   13488   3236  -  I    11:17       0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
root    62413   0.0  0.0   13488   3240  -  I    11:15       0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root    74547   0.0  0.0   13488   3244  -  I    11:17       0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root    17500   0.0  0.0   12720   2388  0  S+   11:20       0:00.00 grep named

So there are a handful of restart jobs piled up, but the restart is not really happening. The listening ports are gone already (I have BIND listen on 0.0.0.0/0 port 53):

Code Select

netstat -na|fgrep .53
shows no result. When I truss the process it spends all of its time in nanosleep() calls:

Code Select

nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
[...]

Does anybody have an idea what might be going on? Which actions on the firewall do lead to a BIND restart, anyway?

Title: Re: BIND (named) hanging in unresponsive state
Post by: mimugmail on March 25, 2024, 04:37:12 PM

I'd guess you have a daily cronjob for updating Blocklists and the script fails for whatever reason?

Title: Re: BIND (named) hanging in unresponsive state
Post by: Patrick M. Hausen on March 25, 2024, 04:37:54 PM

No blocklists in my BINDs - I chain AdGuard Home for that.

Title: Re: BIND (named) hanging in unresponsive state
Post by: Patrick M. Hausen on April 23, 2024, 07:44:08 PM

This seems to happen whenever I reboot my switch. When I do this the lagg interface connected to OPNsense and carrying all my VLANs toggles. When the switch is back up and layer 2 connectivity restored this is the situation on OPNsense:

Code Select

root@opnsense:~ # ps awwux|grep named
root    28282   0.0  0.0   13488   3236  -  S    19:40       0:00.01 /bin/sh /usr/local/etc/rc.d/named restart
root    34584   0.0  0.0   13488   3244  -  S    19:40       0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root    52578   0.0  0.0   13488   3228  -  I    19:38       0:00.01 /bin/sh /usr/local/etc/rc.d/named restart
root    61143   0.0  0.0   13488   3236  -  I    19:38       0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
bind    96171   0.0  0.6  161148  46068  -  Ss   Thu11      10:36.60 /usr/local/sbin/named -u bind -c /usr/local/etc/namedb/named.conf

named is unresponsive and the restart processes are "piling up".

Possibly I shall go back to bind to 127.0.0.1 only and use NAT port forwarding ...

Title: Re: BIND (named) hanging in unresponsive state
Post by: netnut on April 23, 2024, 08:02:05 PM

Quote from: Patrick M. Hausen on April 23, 2024, 07:44:08 PM
Possibly I shall go back to bind to 127.0.0.1 only and use NAT port forwarding ...

Is there a specific reason you don't use Unbound ? I don't know if you're using lot of domains in Bind, but I can recommend (and running stable for years) Unbound with specific "Query Forwarding" domains pointing to Bind running on localhost port 53053.

I'm using only ±30 domains, there's of course some administrative overhead defining those forwards.

Title: Re: BIND (named) hanging in unresponsive state
Post by: Patrick M. Hausen on April 23, 2024, 08:21:00 PM

I have locally maintained zones so I need BIND and running AdGuard Home I did not want to bring a third service into the mix.

EDIT: I just reworked all local zones into domain overrides at home. If that proves to be stable, I'll probably pick up your suggestion for the secondary zones I have at work.

Title: Re: BIND (named) hanging in unresponsive state
Post by: netnut on April 24, 2024, 02:52:00 AM

Quote from: Patrick M. Hausen on April 23, 2024, 08:21:00 PM
I have locally maintained zones so I need BIND and running AdGuard Home I did not want to bring a third service into the mix.

Makes sense :-). How many resolvers does a man need...

Based on your observation I noticed quite some restarts of Bind in my logs too with random intervals (never looked at it to be honest), but all clean without any restart zombies. Unfortunately I've no clue what is/was the trigger: Saving interface config ? Carrier Up/Down of directly connected host ?

OPNsense Forum

Archive => 24.1, 24.4 Legacy Series => Topic started by: Patrick M. Hausen on March 25, 2024, 11:30:53 AM