Random loss of WAN conectivity

9ck · July 22, 2025, 04:14:21 PM

Hi forum
I've been trying to identify why I sometimes lose WAN connection. I've ruled out my ISP. I'm loosing WAN connectivity on both WiFi and LAN, but I can still access everything locally (OPNsense keep on running). Reboot OPNsense and the WAN connection is usually back. I have a suspicion that it has something to do with our company PCs running a VPN connections and that I've set up Unbound DNS in OPNsense. But I'm in over my head here. I've shared systemlogs with Copilot which has been working on a reply since yesterday (12 logs).

I run OPNsense on a dedicated machine (Protectli) as the only thing on it. I have a Unifi USW Pro24PoE as main switch. To this I have a Unifi USWPro24 and a Unifi FlexMini connected. Three Unifi APs connected to the main switch. All DNS and DHCP handled by OPNsense with Unbound DNS enabled and "locked down" so it will not forward any other DNS requests. Set up to use Quad9. LAN spilt up in several VLANs.

Some of the things that I notice in the systemlog.

Code Select

2025-07-21T14:23:58	Warning	opnsense	 /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
...
2025-07-21T14:23:57	Critical	dhclient	 exiting.
2025-07-21T14:23:57	Error	dhclient	 connection closed
2025-07-21T14:23:57	Warning	opnsense	 /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
2025-07-21T14:23:57	Notice	opnsense	 /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : radvd_configure_dhcp(,inet6,[lan]))
2025-07-21T14:23:57	Notice	opnsense	 /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure(,inet6,[lan]))
2025-07-21T14:23:57	Notice	opnsense	 /usr/local/etc/rc.linkup: plugins_configure dhcp (,inet6,[lan])
2025-07-21T14:23:57	Notice	opnsense	 /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for wan(igc0)
2025-07-21T14:23:56	Notice	opnsense	 /usr/local/etc/rc.linkup: plugins_configure newwanip:rfc2136 (,[wan])
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : wireguard_sync())
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : webgui_configure_do(,[wan]))
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : vxlan_configure_do())
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : unbound_configure_do(,[wan]))
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : openssh_configure_do(,[wan]))
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : opendns_configure_do())
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : ntpd_configure_do())
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : dhcrelay_configure_if(,[wan],inet))
2025-07-21T14:23:55	Notice	opnsense	 /usr/local/etc/rc.newwanip: plugins_configure newwanip (,[wan],inet)
...
2025-07-21T14:23:09	Error	opnsense	 /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '83515''(pid:/var/run/dhclient.igc0.pid)  returned exit code '1', the output was 'kill: 83515: No such process'
2025-07-21T14:23:09	Notice	opnsense	 /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for wan(igc0)
2025-07-21T14:23:09	Notice	kernel	 <6>igc0: link state changed to UP
2025-07-21T14:23:09	Error	opnsense	 /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '83515''(pid:/var/run/dhclient.igc0.pid)  returned exit code '1', the output was 'kill: 83515: No such process'
2025-07-21T14:23:09	Warning	opnsense	 /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
...
2025-07-21T14:23:06	Error	opnsense	 /usr/local/etc/rc.linkup: The command '/sbin/dhclient -c '/var/etc/dhclient_wan.conf' -p '/var/run/dhclient.igc0.pid' 'igc0'' returned exit code '1', the output was 'igc0: no link .............. giving up'
2025-07-21T14:23:06	Notice	kernel	 <6>igc0: link state changed to DOWN
2025-07-21T14:23:06	Notice	kernel	 <6>igc0: link state changed to UP
2025-07-21T14:23:02	Notice	kernel	 <6>igc0: link state changed to DOWN
2025-07-21T14:23:02	Notice	kernel	 <6>igc0: link state changed to UP
2025-07-21T14:22:55	Error	opnsense	 /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '70234''(pid:/var/run/dhclient.igc0.pid)  returned exit code '1', the output was 'kill: 70234: No such process'
2025-07-21T14:22:55	Notice	opnsense	 /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for wan(igc0)
2025-07-21T14:22:55	Error	opnsense	 /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '70234''(pid:/var/run/dhclient.igc0.pid)  returned exit code '1', the output was 'kill: 70234: No such process'
2025-07-21T14:22:55	Warning	opnsense	 /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
...
2025-07-21T14:22:00	Notice	dhclient	 dhclient-script: Reason REBOOT on igc0 executing
2025-07-21T14:21:59	Notice	kernel	 <6>igc0: link state changed to UP
2025-07-21T14:21:58	Error	dhclient	 send_packet: Network is down
2025-07-21T14:21:57	Error	dhclient	 send_packet: Network is down
2025-07-21T14:21:56	Notice	kernel	 <6>igc0: link state changed to DOWN
2025-07-21T14:21:56	Notice	kernel	 <6>igc0: link state changed to UP
2025-07-21T14:21:55	Error	dhclient	 send_packet: Network is down
2025-07-21T14:21:53	Error	dhclient	 send_packet: Network is down
2025-07-21T14:21:52	Notice	kernel	 <6>igc0: link state changed to DOWN
2025-07-21T14:21:52	Notice	dhclient	 dhclient-script: Reason PREINIT on igc0 executing
2025-07-21T14:21:51	Notice	kernel	 <6>igc0: link state changed to UP
2025-07-21T14:21:45	Notice	kernel	 <6>igc0: link state changed to DOWN
2025-07-21T14:21:44	Notice	kernel	 <6>igc0: link state changed to UP
2025-07-21T14:21:41	Notice	kernel	 <6>igc0: link state changed to DOWN
2025-07-21T14:21:41	Notice	kernel	 <6>igc0: link state changed to UP
2025-07-21T14:21:41	Notice	opnsense	 /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for wan(igc0)
2025-07-21T14:21:29	Critical	dhclient	 exiting.
2025-07-21T14:21:29	Error	dhclient	 connection closed
2025-07-21T14:21:29	Warning	opnsense	 /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
2025-07-21T14:21:28	Notice	opnsense	 /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : radvd_configure_dhcp(,inet6,[lan]))
2025-07-21T14:21:28	Notice	opnsense	 /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure(,inet6,[lan]))
2025-07-21T14:21:28	Notice	opnsense	 /usr/local/etc/rc.linkup: plugins_configure dhcp (,inet6,[lan])
2025-07-21T14:21:28	Notice	opnsense	 /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for wan(igc0)
2025-07-21T14:21:28	Notice	opnsense	 /usr/local/etc/rc.linkup: plugins_configure newwanip:rfc2136 (,[wan])
2025-07-21T14:21:27	Notice	kernel	 <6>igc0: link state changed to DOWN

Since I've used Copilot with varying succes in the past, I would really appreciere if someone could "qualify" the reply I've got. Here is the breakdown from Copilot. I realize it's not fair to ask for a specific reply with the lack of insight, but most of you probably have an opinion on the reply from Copilot and maybe someone has been in similar situation.

🔍 What the Logs Reveal (Crash Patterns)
Across nearly every incident:

WAN Interface Flapping (igc0): Link goes DOWN → UP in rapid succession, often within seconds.
DHCP Struggles: dhclient fails to send packets (Network is down) just as interface recovery begins.
Plugin Overload: rc.linkup and rc.newwanip trigger multiple plugin reconfigurations (Unbound, VPN, DNS, Monitor) before the link is stable.
Resolver Turmoil: Unbound is caught mid-crash—trying to forward DNS while upstream hasn't recovered.

And in several logs:

Client Triggers Detected: Either from your company PC or Teams—DNS requests or TCP resets right as WAN begins recovering.

🧠 Root Cause Themes

Root Cause	Frequency	Example Trigger
Link-State Flapping	High	igc0: link state changed to DOWN
DHCP Reinitialization Failure	High	send_packet: Network is down
Plugin Execution Race	Medium	plugins_configure dns, ipsec
Early Client DNS Requests	Medium	Resets from PC during bootup
Unbound Restart Mid-Recovery	Medium	unbound_configure_do()

✅ What You Can Do
1. Tame the Interface

In OPNsense, go to Interfaces → WAN → Advanced
Set Link Up Delay to ~5 seconds to avoid plugin races
Enable Prevent interface flapping logic if available

2. Stagger Plugin Execution

Use System → Settings → Tunables to defer DNS resolver and VPN plugin execution until WAN is confirmed UP
Consider custom script delays for Unbound reloads

3. Isolate Company Devices

Create a separate VLAN for corporate PCs
In Unbound, disable DNS over TCP for that VLAN (use port override to reject encrypted queries if needed)

4. Fortify DHCP Recovery

Add retry-delay and initial-delay tweaks to dhclient.conf (can stabilize lease recovery after link events)

5. Monitor Client Behavior

Packet capture during bootup—look for TCP resets or malformed DNS requests
Sandbox MS Teams: test with and without the app running to compare crash likelihood

💡 Next Steps
If you want, I can build:

A custom Unbound config for resilient DNS per VLAN
A shell script to re-sequence plugin executions
A diagnostic dashboard to catch crash patterns early

I will appreciate any help provided.

meyergru · July 22, 2025, 04:45:14 PM

Obviously, your WAN link is flapping. If you are correct in "ruling your ISP" out, I gues this does not happen when you use another router.

Quoting AI advice without any knowledge about your topology is not very helpful. BTW: Is you OpnSense behind a router or does it build up the connection by itself over a modem/ONT?

Since Proteclis usually have no hardware problems and I guess that the pure link is not the culprit here (since it looks like an Intel NIC), without further hints, I would guess that your ISP is one of those who terminate the connection when they see illegal traffic (like RFC1918 IPs as source address which have not been NATed). Such packets may occur when your local devices use addresses that should be visible only over the VPN and/or your routes are set incorrectly.

You could try to pinpoint that by creating a WAN outbound block rule (this is one of the rare occasions they are useful) with RFC1918 as source.

9ck · July 22, 2025, 05:32:55 PM

My OPNsense is behind a modem/router provided by my IPS. The ISP provided router has been set to bridge mode. They do not detect any issues on their side when I loose my WAN connection.

Do you refer to the physical cable and sockets whith "the pure link" (you'll have to excuse but English isn't my 1st language)?

I was sure that I was allowing only non-RFC1918 traffic going to the WAN, but going through my rules I do indeed see that in VLAN2SEC I allow all (*) going to WAN. This is the VLAN where I have our company PCs on (the ones using the company VPN service - out of my control). Could this be my issue?

9ck · July 22, 2025, 05:36:54 PM

This is the principle used on my other interfaces.

9ck · July 22, 2025, 05:43:27 PM

Quote from: meyergru on July 22, 2025, 04:45:14 PMYou could try to pinpoint that by creating a WAN outbound block rule (this is one of the rare occasions they are useful) with RFC1918 as source.

Not sure I understand this correctly. Wouldn't such a rule block all my outbound traffic? What would I look for? Should this maybe reveal if its my company VPN ip address causing issues?

meyergru · July 22, 2025, 06:19:24 PM

No, the blocking rule would block only packets with a source IP within RFC 1918 to "any", but on WAN "out" direction. You can even log that rule to see if it matches.

Normally, your LAN packets (which are with RFC 1918, too), would be rewritten via NAT to originate from your WAN IP, so that such a rule would not apply for this kind of legitimate traffic.

Illegal traffic can occur when any of your clients use RFC1918 IPs that should normally be routed over your VPN, but by mistake reach your OpnSense, because it is your default gateway. OpnSense will then use its own default gateway (at the ISP) to send those packets to. If the latter are not NATed (which they probably are not, because no such rule exists), they would leave your OpnSense. The ISP router cannot handle these packets, because it knows that they could never be answered by anybody (being RFC1918, thus not routeable on the internet).

Then, it is the ISP's choice to drop such packets. However, some ISPs think this is a hacking attempt and drop your whole connection.

9ck · July 22, 2025, 06:33:25 PM

Thanks for the explanation - I'll try to digest. I've set the rule up (but also changed the mistake I had in the VLAN2SEC rule). I guess I'll have to wait and see if something shows up in the logs or if this does the trick. Hope I've understood the outbound block rule correct.

Could this also be the reason that I loose my connection to my LAN via Wireguard after 24h or so (from outside my LAN obviously)?

meyergru · July 22, 2025, 06:42:12 PM

Since you did not check the "Log" box, nothing will be logged.

A drop after 24h can be because of a forced reconnect by your ISP, this is common in Germany, e.g.

9ck · July 22, 2025, 06:50:11 PM

Quote from: meyergru on July 22, 2025, 06:42:12 PMSince you did not check the "Log" box, nothing will be logged.
A drop after 24h can be because of a forced reconnect by your ISP, this is common in Germany, e.g.

Ahh... I though everything would show in the Log Files > Live View - probably not the best place to try to track things.
After the drop I can not reconnect to my LAN via Wireguard. Would that be the case if it was a forced reconnectivity issue? Sorry if I'm not being informative enough.
EDIT: Don't waste time on my Wireguard-issue. I just recalled that the local machine I was trying to access had crashed while I was away. I need to do more thurough testing in order to give you the correct picture.

meyergru · July 22, 2025, 06:59:27 PM

Probably, if your IP changes during the process and the other side does not detect that it should reconnect - Wireguard does not do that per default. There is a cron job to detect stale connections in OpnSense.

9ck · July 22, 2025, 07:53:18 PM

Looking into this I see that I've enabled another cronjob that restarts Wireguard after 6 hours in order to refresh the public IP. Would you recommend I keep this? I have a dynamic ip address. In reality it is only being renegotiated if connection is down for 3 hours or more. Wonder if I've done this correct. Shouldn't it be the Dynamic DNS settings that I refresh?

Any recommendation as to how often I should run the job that will renew the DNS for Wireguard?

Random loss of WAN conectivity

9ck

July 22, 2025, 04:14:21 PM Last Edit: July 22, 2025, 04:24:48 PM by 9ck

meyergru

July 22, 2025, 04:45:14 PM #1

9ck

July 22, 2025, 05:32:55 PM #2

9ck

July 22, 2025, 05:36:54 PM #3

9ck

July 22, 2025, 05:43:27 PM #4 Last Edit: July 22, 2025, 05:45:34 PM by 9ck

meyergru

July 22, 2025, 06:19:24 PM #5 Last Edit: July 22, 2025, 06:21:23 PM by meyergru

9ck

July 22, 2025, 06:33:25 PM #6 Last Edit: July 22, 2025, 06:37:15 PM by 9ck

meyergru

July 22, 2025, 06:42:12 PM #7

9ck

July 22, 2025, 06:50:11 PM #8 Last Edit: July 22, 2025, 07:03:16 PM by 9ck

meyergru

July 22, 2025, 06:59:27 PM #9

9ck

July 22, 2025, 07:53:18 PM #10 Last Edit: July 22, 2025, 08:03:16 PM by 9ck