1
24.1 Production Series / BIND (named) hanging in unresponsive state
« on: March 25, 2024, 11:30:53 am »
Hi all,
I am using BIND instead of Unbound in most of my deployments. Recently the process seems to become unresponsive for no obvious reason every other day or so.
When I check the state on the firewall it looks like this:
Does anybody have an idea what might be going on? Which actions on the firewall do lead to a BIND restart, anyway?
I am using BIND instead of Unbound in most of my deployments. Recently the process seems to become unresponsive for no obvious reason every other day or so.
When I check the state on the firewall it looks like this:
Code: [Select]
root@opnsense:~ # ps awwux|grep named
root 4974 0.0 0.0 13488 3236 - I 11:09 0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
root 15735 0.0 0.0 13488 3244 - I 11:09 0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root 35956 0.0 0.0 13488 3236 - I 11:11 0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
root 48098 0.0 0.0 13488 3244 - I 11:11 0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root 51230 0.0 0.0 13488 3228 - I 11:13 0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
root 51746 0.0 0.0 13488 3232 - I 11:15 0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
bind 53253 0.0 0.4 106704 33780 - Ss 20:26 2:06.97 /usr/local/sbin/named -u bind -c /usr/local/etc/namedb/named.conf
root 61439 0.0 0.0 13488 3236 - I 11:13 0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root 61879 0.0 0.0 13488 3236 - I 11:17 0:00.02 /bin/sh /usr/local/etc/rc.d/named restart
root 62413 0.0 0.0 13488 3240 - I 11:15 0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root 74547 0.0 0.0 13488 3244 - I 11:17 0:00.00 /bin/sh /usr/local/etc/rc.d/named restart
root 17500 0.0 0.0 12720 2388 0 S+ 11:20 0:00.00 grep named
So there are a handful of restart jobs piled up, but the restart is not really happening. The listening ports are gone already (I have BIND listen on 0.0.0.0/0 port 53):Code: [Select]
netstat -na|fgrep .53
shows no result. When I truss the process it spends all of its time in nanosleep() calls:Code: [Select]
nanosleep({ 0.010000000 }) = 0 (0x0)
nanosleep({ 0.010000000 }) = 0 (0x0)
nanosleep({ 0.010000000 }) = 0 (0x0)
nanosleep({ 0.010000000 }) = 0 (0x0)
nanosleep({ 0.010000000 }) = 0 (0x0)
nanosleep({ 0.010000000 }) = 0 (0x0)
[...]
Does anybody have an idea what might be going on? Which actions on the firewall do lead to a BIND restart, anyway?