Show Posts

23.1 Legacy Series / 23.1.8: dhcp6c script race condition causes hundreds of dpinger processes?

« on: May 29, 2023, 04:50:22 am »

Edit: Repro/summary below

I'm seeing what appear to be new instances of dpinger/updaterrd.sh appear gradually over time, lining up with DHCPv6 renewal times. I have two wan interfaces, and the dpinger instances are specific to the primary WAN DHCPv6 gateway.

dpinger going wild on only one of 4 monitored gateways (there were well over 150 total):

Code: [Select]

root    12033   0.0  0.1  17732   2180  -  Is   11:16      0:11.11 /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2600:2600:WAN:IP:: -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -C /usr/local/etc/rc.syshook monitor -s 1s -l 2s -t 60s
root    12140   0.0  0.1  17732   2180  -  Is   Fri02      0:13.83 /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2600:2600:WAN:IP:: -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -C /usr/local/etc/rc.syshook monitor -s 1s -l 2s -t 60s
root    12674   0.0  0.1  13504   2108  -  Is   01:17      0:03.04 /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2600:2600:WAN:IP:: -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -C /usr/local/etc/rc.syshook monitor -s 1s -l 2s -t 60s
root    12776   0.0  0.1  17732   2180  -  Is   20:47      0:06.05 /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2600:2600:WAN:IP:: -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -C /usr/local/etc/rc.syshook monitor -s 1s -l 2s -t 60s
root    13736   0.0  0.1  17732   2180  -  Is   18:16      0:07.63 /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2600:2600:WAN:IP:: -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -C /usr/local/etc/rc.syshook monitor -s 1s -l 2s -t 60s
root    14579   0.0  0.1  17732   2180  -  Is   10:46      0:10.94 /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2600:2600:WAN:IP:: -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -C /usr/local/etc/rc.syshook monitor -s 1s -l 2s -t 60s
root    15498   0.0  0.1  17732   2180  -  Is   19:16      0:07.24 /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2600:2600:WAN:IP:: -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -C /usr/local/etc/rc.syshook monitor -s 1s -l 2s -t 60s
root    15832   0.0  0.1  17732   2180  -  Is   14:16      0:09.50 /usr/local/bin/dpinger -f -S -r 0 -i WAN_DHCP6 -B 2600:2600:WAN:IP:: -p /var/run/dpinger_WAN_DHCP6.pid -u /var/run/dpinger_WAN_DHCP6.sock -C /usr/local/etc/rc.syshook monitor -s 1s -l 2s -t 60s

All of the dpinger processes operate in parallel resulting in large amounts of ICMP traffic.

updaterrd.sh also showing up at lease renewals (wc -l reports 250+ here)

Code: [Select]

root    94437   0.0  0.0  13508   776  -  Is   Sat07       0:13.04 /bin/sh /var/db/rrd/updaterrd.sh
root    94634   0.0  0.0  13508   720  -  Is   Fri03       0:24.33 /bin/sh /var/db/rrd/updaterrd.sh
root    94713   0.0  0.0  13508   720  -  Is   Fri21       0:17.22 /bin/sh /var/db/rrd/updaterrd.sh
root    94840   0.0  0.0  13508   720  -  Is   Thu23       0:26.19 /bin/sh /var/db/rrd/updaterrd.sh
root    95958   0.0  0.0  13508   780  -  Is   Sat17       0:09.44 /bin/sh /var/db/rrd/updaterrd.sh
root    96010   0.0  0.0  13508   772  -  Ss   12:17       0:02.78 /bin/sh /var/db/rrd/updaterrd.sh
root    96619   0.0  0.0  13508   776  -  Is   Fri15       0:19.62 /bin/sh /var/db/rrd/updaterrd.sh
root    96973   0.0  0.0  13508   776  -  Ss   07:46       0:04.10 /bin/sh /var/db/rrd/updaterrd.sh
root    97095   0.0  0.0  13508   720  -  Is   Fri19       0:17.42 /bin/sh /var/db/rrd/updaterrd.sh
root    97793   0.0  0.0  13508   776  -  Ss   Sat07       0:13.07 /bin/sh /var/db/rrd/updaterrd.sh
root    98511   0.0  0.0  13508   776  -  Ss   22:46       0:07.43 /bin/sh /var/db/rrd/updaterrd.sh
root    99268   0.0  0.0  13508   720  -  Is   Thu22       0:26.39 /bin/sh /var/db/rrd/updaterrd.sh
root    99893   0.0  0.0  13508   720  -  Is   Fri20       0:17.44 /bin/sh /var/db/rrd/updaterrd.sh

I have temporarily terminated monitoring of the WAN IPv6 gateway, but the updaterrd.sh processes continue to creep up over time.

The best clue I've found so far is in the logs as this newwanipv6 script running constantly, but only for this particular interface.

Code: [Select]

2023-05-28T22:19:26-04:00	Notice	opnsense	/usr/local/etc/rc.newwanipv6: No IP change detected (current: 2600:WAN:IP::, interface: WAN[wan])	
2023-05-28T22:17:22-04:00	Error	opnsense	/usr/local/etc/rc.newwanipv6: The command '/bin/kill -'TERM' '70636'' returned exit code '1', the output was 'kill: 70636: No such process'	
2023-05-28T22:17:22-04:00	Error	opnsense	/usr/local/etc/rc.newwanipv6: The command '/bin/kill -'TERM' '70636'' returned exit code '1', the output was 'kill: 70636: No such process'

I suspect that this is to me having to provide a DHCPv6 configuration override in order to pull multiple prefixes simultaneously (AT&T CPE quirk, must pull multiple /64s instead of one larger /60). Clarification: Each /64 prefix must have its own IA_PD, otherwise the server will not respond.

My tentative next steps are to try to specify a configuration override identical to the non-overridden version that pulls a single /64, just to figure out if this is an error in how overridden DHCPv6 configs function or if it's an error in my actual configuration or an unavoidable side effect of having multiple delegated prefixes confusing the system.

The machine is still in its weird state, is there any additional information I should collect before rebooting and continuing to try to track this down?

Even if this is an error in my dhcpv6 config override, are the dpinger instances not realizing that the pid file already corresponds to an already-running dpinger for that interface its own separate bug?

Topics - rssor

23.1 Legacy Series / 23.1.8: dhcp6c script race condition causes hundreds of dpinger processes?