1
22.7 Legacy Series / WAN via DHCP (e.g. dhclient) dies aperiodically w/ no return.
« on: July 28, 2022, 10:38:29 pm »
I had this issue with a competing firewall, then tried OPNSense 22.1 and experienced the same issue, I then upgraded to OPNSense 22.7 and it just happened again.
Logs state:
Setup: OPNSense 22.7 firewall igb0 (WAN, DHCP) <---> ISP provided Router <--->ISP
Symptoms: All is fine and then suddenly all pings and DNS fail. Upstream gateway is noted as "down"
Workarounds: (all work, listed in order of speed of recovery):
* Renew WAN DHCP lease ( https://OPNSENSE_IP/status_interfaces.php )
* Bring down and up igb0 ( e.g. ifconfig igb0 down && ifconfig igb0 up )
* Unplug ethernet cable to fiber router, wait 30 seconds, plug cable back in
* Reboot
Unsuccessful Attempts to fix:
* Modify DHCP Client timeout to 180 seconds (from the default of 60)
* Disable IPV6 settings on (WAN)
* Disable Gateway monitoring.
Some logs which might be helpful in diagnosing this.
Here's the latest which happened at about 13:25 today
I see an old Git Ticket: https://github.com/opnsense/core/issues/2517
and I've seen solutions that recommended monitoring via ping via cron and having a script which disables and re-enables the DHCP connection ( https://forum.opnsense.org/index.php?topic=11928.0 and https://forum.opnsense.org/index.php?topic=26455.msg127757#msg127757 ) but I would rather solve the root issue.
I am guessing that when dhclient dies it deletes /var/run/dhclient.igb0.pid and that breaks /usr/local/etc/rc.configure_interface which assumes that PID file exists.
Ultimately it would be good to figure out why dhclient dies and barring that modifying /usr/local/etc/rc.configure_interface to not assume the PID is running.
I looked at /usr/local/etc/rc.configure_interface but it's a lot of include files so I thought I'd ask for help as I'm new to this software.
Ideas?
Logs state:
Quote
My address (REDACTED) was deleted, dhclient exiting
/usr/local/etc/rc.configure_interface: The command '/bin/pkill -'TERM' -F '/var/run/dhclient.igb0.pid'' returned exit code '3', the output was 'pkill: Cannot get process list (kvm_getprocs: No such process)'
Setup: OPNSense 22.7 firewall igb0 (WAN, DHCP) <---> ISP provided Router <--->ISP
Symptoms: All is fine and then suddenly all pings and DNS fail. Upstream gateway is noted as "down"
Workarounds: (all work, listed in order of speed of recovery):
* Renew WAN DHCP lease ( https://OPNSENSE_IP/status_interfaces.php )
* Bring down and up igb0 ( e.g. ifconfig igb0 down && ifconfig igb0 up )
* Unplug ethernet cable to fiber router, wait 30 seconds, plug cable back in
* Reboot
Unsuccessful Attempts to fix:
* Modify DHCP Client timeout to 180 seconds (from the default of 60)
* Disable IPV6 settings on (WAN)
* Disable Gateway monitoring.
Some logs which might be helpful in diagnosing this.
Here's the latest which happened at about 13:25 today
Quote
===========
/var/log/resolver/latest.log
<30>1 2022-07-28T12:54:22-05:00 OPNsense.localdomain unbound 50615 - [meta sequenceId="74"] [50615:0] info: start of service (unbound 1.16.1).
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain unbound 50615 - [meta sequenceId="1"] [50615:0] info: service stopped (unbound 1.16.1).
===========
/var/log/routing/latest.log
<30>1 2022-07-28T12:54:21-05:00 OPNsense.localdomain radvd 34818 - [meta sequenceId="5"] version 2.19 started
<28>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 35412 - [meta sequenceId="1"] exiting, 1 sigterm(s) received
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 35412 - [meta sequenceId="2"] sending stop adverts
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 35412 - [meta sequenceId="3"] removing /var/run/radvd.pid
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 35412 - [meta sequenceId="4"] returning from radvd main
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 83514 - [meta sequenceId="5"] version 2.19 started
==================
/var/log/syslog/latest.log
<27>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain dhclient 76475 - [meta sequenceId="1"] My address (REDACTED) was deleted, dhclient exiting
<27>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain dhclient 76475 - [meta sequenceId="2"] connection closed
<26>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain dhclient 76475 - [meta sequenceId="3"] exiting.
<11>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain opnsense 17558 - [meta sequenceId="4"] /usr/local/etc/rc.configure_interface: The command '/bin/pkill -'TERM' -F '/var/run/dhclient.igb0.pid'' returned exit code '3', the output was 'pkill: Cannot get process list (kvm_getprocs: No such process)'
===========
I see an old Git Ticket: https://github.com/opnsense/core/issues/2517
and I've seen solutions that recommended monitoring via ping via cron and having a script which disables and re-enables the DHCP connection ( https://forum.opnsense.org/index.php?topic=11928.0 and https://forum.opnsense.org/index.php?topic=26455.msg127757#msg127757 ) but I would rather solve the root issue.
I am guessing that when dhclient dies it deletes /var/run/dhclient.igb0.pid and that breaks /usr/local/etc/rc.configure_interface which assumes that PID file exists.
Ultimately it would be good to figure out why dhclient dies and barring that modifying /usr/local/etc/rc.configure_interface to not assume the PID is running.
I looked at /usr/local/etc/rc.configure_interface but it's a lot of include files so I thought I'd ask for help as I'm new to this software.
Ideas?