OPNsense Forum

Archive => 22.7 Legacy Series => Topic started by: rar on July 28, 2022, 10:38:29 PM

Title: WAN via DHCP (e.g. dhclient) dies aperiodically w/ no return.
Post by: rar on July 28, 2022, 10:38:29 PM
I had this issue with a competing firewall, then tried OPNSense 22.1 and experienced the same issue, I then upgraded to OPNSense 22.7 and it just happened again.

Logs state:

Quote
My address (REDACTED) was deleted, dhclient exiting
/usr/local/etc/rc.configure_interface: The command '/bin/pkill -'TERM' -F '/var/run/dhclient.igb0.pid'' returned exit code '3', the output was 'pkill: Cannot get process list (kvm_getprocs: No such process)'


Setup: OPNSense 22.7 firewall igb0 (WAN, DHCP) <---> ISP provided  Router <--->ISP

Symptoms: All is fine and then suddenly all pings and DNS fail. Upstream gateway is noted as "down"

Workarounds:  (all work, listed in order of speed of recovery):

* Renew WAN DHCP lease ( https://OPNSENSE_IP/status_interfaces.php )

* Bring down and up igb0 ( e.g. ifconfig igb0 down  && ifconfig igb0 up  )

* Unplug ethernet cable to fiber router, wait 30 seconds, plug cable back in

* Reboot

Unsuccessful Attempts to fix:

* Modify DHCP Client timeout to 180 seconds (from the default of 60)

* Disable IPV6 settings on  (WAN)

* Disable Gateway monitoring.

Some logs which might be helpful in diagnosing this.

Here's the latest which happened at about 13:25 today

Quote
===========
/var/log/resolver/latest.log
<30>1 2022-07-28T12:54:22-05:00 OPNsense.localdomain unbound 50615 - [meta sequenceId="74"] [50615:0] info: start of service (unbound 1.16.1).
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain unbound 50615 - [meta sequenceId="1"] [50615:0] info: service stopped (unbound 1.16.1).
===========
/var/log/routing/latest.log
<30>1 2022-07-28T12:54:21-05:00 OPNsense.localdomain radvd 34818 - [meta sequenceId="5"] version 2.19 started
<28>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 35412 - [meta sequenceId="1"] exiting, 1 sigterm(s) received
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 35412 - [meta sequenceId="2"] sending stop adverts
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 35412 - [meta sequenceId="3"] removing /var/run/radvd.pid
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 35412 - [meta sequenceId="4"] returning from radvd main
<30>1 2022-07-28T13:25:28-05:00 OPNsense.localdomain radvd 83514 - [meta sequenceId="5"] version 2.19 started
==================
/var/log/syslog/latest.log
<27>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain dhclient 76475 - [meta sequenceId="1"] My address (REDACTED) was deleted, dhclient exiting
<27>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain dhclient 76475 - [meta sequenceId="2"] connection closed
<26>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain dhclient 76475 - [meta sequenceId="3"] exiting.
<11>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain opnsense 17558 - [meta sequenceId="4"] /usr/local/etc/rc.configure_interface: The command '/bin/pkill -'TERM' -F '/var/run/dhclient.igb0.pid'' returned exit code '3', the output was 'pkill: Cannot get process list (kvm_getprocs: No such process)'
===========

I see an old  Git Ticket: https://github.com/opnsense/core/issues/2517
and I've seen solutions that recommended monitoring via ping via cron and having a script which disables and re-enables the DHCP connection ( https://forum.opnsense.org/index.php?topic=11928.0  and https://forum.opnsense.org/index.php?topic=26455.msg127757#msg127757 ) but I would rather solve the root issue.


I am guessing that when dhclient dies it deletes /var/run/dhclient.igb0.pid and that breaks /usr/local/etc/rc.configure_interface which assumes that PID file exists. 

Ultimately it would be good to figure out why dhclient dies and barring that modifying /usr/local/etc/rc.configure_interface to not assume the PID is running.

I looked at /usr/local/etc/rc.configure_interface but it's a lot of include files so I thought I'd ask for help as I'm new to this software.

Ideas?
Title: Re: WAN via DHCP (e.g. dhclient) dies aperiodically w/ no return.
Post by: franco on July 29, 2022, 09:22:45 AM
If you say "competing" you mean pfSense? Just to get that data point that this is a shared issue either in core code or FreeBSD dhclient code? Slightly leaning on the FreeBSD side since the core code shift between both projects is considerable years later.

This seems to be the most relevant:

<27>1 2022-07-28T13:25:27-05:00 OPNsense.localdomain dhclient 76475 - [meta sequenceId="1"] My address (REDACTED) was deleted, dhclient exiting

To me the timestamps are suspicious in that exiting happens in lockstep with the core code (timestamps match exactly) so the system is just reacting to an external event and dhclient itself isn't driving this situation.

The dhclient kill is just opportunistic. It seems to happen here since dhclient exits by itself or triggered by a link disruption (check dmesg to confirm or deny) but even though the error happens the code goes on to reconfigure the interface.

Does this mean afterwards dhclient is not up? Or is a stale dhclient instance left in the process table?


Cheers,
Franco
Title: Re: WAN via DHCP (e.g. dhclient) dies aperiodically w/ no return.
Post by: rar on July 30, 2022, 03:19:15 PM
Yes  pfSense.

I just found this report

Quotearp: 0c:c4:7a:REDACTED is using my ip address REDACTED on igb0

in /var/log/dmesg.today (not a log I would have guessed to look into)

which is interesting because the 0c:c4:7a hw prefix is a supermicro board. I too am using a supermicro board for OPNSense but with a different address.

So it appears the ISP is feeding someone else the same IP or someone else has that IP hardcoded.

Dhclient is killed but the firewall just reports "Gateway Down" instead of "IP conflict" or something that would make sense prompting users unaware of the root cause to continually refresh. 

I'll call the ISP.

Is there a warning that I'm missing in the GUI that would alert for such an event? Would be VERY helpful if there was.

Edit: I called the ISP. I tried the "release" of the IP address but when it came back the ISP gave the firewall the same one with the conflict.  The ISP tech support said they can't assign a different static IP to get around the "two-customer - same DHCP address" issue and can't clear that DHCP conflict. Instead they recommended I turn off my firewall for 12 hours, then hopefully I will get another IP address that's not used by another customer. 
Title: Re: WAN via DHCP (e.g. dhclient) dies aperiodically w/ no return.
Post by: WN1X on July 31, 2022, 12:42:01 PM
Or take the easy way out and change your WAN mac address...simply adding/subtracting one should resolve your problem.
Title: Re: WAN via DHCP (e.g. dhclient) dies aperiodically w/ no return. SOLVED
Post by: rar on August 17, 2022, 04:51:24 AM
Solved: The issue was that this is a Supermicro board with an Intelligent Platform Management Interface (IPMI) (HP calls it integrated lights out) that was set in BIOS to get a DHCP address.

So that means that the BIOS sent the IPMI mac address across the WAN interface (igb0) even though there was no cable attached to the IPMI NIC. I found a discussion about that here https://forum.netgate.com/topic/64335/tip-if-you-have-an-ipmi-motherboard-and-constantly-pull-an-internal-ip-on-wan  And the ISP setup their system to give the SAME IP address to all machines that connect through that fiber connection. That meant that the suggestion from WN1X of changing the mac address would not work.

I changed the IPMI in BIOS to be a static IP and that stopped the DCHP from aperiodically dying.

-----

Future work: There needs to be some way that OPNSense can alert that there is an IP address conflict on the WAN network and not just die without any warning, no web-based alerts/logs , and without log messages in /var/log/system or /var/log/routing