OPNsense Forum

English Forums => 25.1, 25.4 Series => Topic started by: 9ck on July 22, 2025, 04:14:21 PM

Title: Random loss of WAN conectivity
Post by: 9ck on July 22, 2025, 04:14:21 PM
Hi forum
I've been trying to identify why I sometimes lose WAN connection. I've ruled out my ISP. I'm loosing WAN connectivity on both WiFi and LAN, but I can still access everything locally (OPNsense keep on running). Reboot OPNsense and the WAN connection is usually back. I have a suspicion that it has something to do with our company PCs running a VPN connections and that I've set up Unbound DNS in OPNsense. But I'm in over my head here. I've shared systemlogs with Copilot which has been working on a reply since yesterday (12 logs).

I run OPNsense on a dedicated machine (Protectli) as the only thing on it. I have a Unifi USW Pro24PoE as main switch. To this I have a Unifi USWPro24 and a Unifi FlexMini connected. Three Unifi APs connected to the main switch. All DNS and DHCP handled by OPNsense with Unbound DNS enabled and "locked down" so it will not forward any other DNS requests. Set up to use Quad9. LAN spilt up in several VLANs.

Some of the things that I notice in the systemlog.
2025-07-21T14:23:58 Warning opnsense /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
...
2025-07-21T14:23:57 Critical dhclient exiting.
2025-07-21T14:23:57 Error dhclient connection closed
2025-07-21T14:23:57 Warning opnsense /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
2025-07-21T14:23:57 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : radvd_configure_dhcp(,inet6,[lan]))
2025-07-21T14:23:57 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure(,inet6,[lan]))
2025-07-21T14:23:57 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (,inet6,[lan])
2025-07-21T14:23:57 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for wan(igc0)
2025-07-21T14:23:56 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure newwanip:rfc2136 (,[wan])
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : wireguard_sync())
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : webgui_configure_do(,[wan]))
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : vxlan_configure_do())
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : unbound_configure_do(,[wan]))
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : openssh_configure_do(,[wan]))
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : opendns_configure_do())
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : ntpd_configure_do())
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : dhcrelay_configure_if(,[wan],inet))
2025-07-21T14:23:55 Notice opnsense /usr/local/etc/rc.newwanip: plugins_configure newwanip (,[wan],inet)
...
2025-07-21T14:23:09 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '83515''(pid:/var/run/dhclient.igc0.pid)  returned exit code '1', the output was 'kill: 83515: No such process'
2025-07-21T14:23:09 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for wan(igc0)
2025-07-21T14:23:09 Notice kernel <6>igc0: link state changed to UP
2025-07-21T14:23:09 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '83515''(pid:/var/run/dhclient.igc0.pid)  returned exit code '1', the output was 'kill: 83515: No such process'
2025-07-21T14:23:09 Warning opnsense /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
...
2025-07-21T14:23:06 Error opnsense /usr/local/etc/rc.linkup: The command '/sbin/dhclient -c '/var/etc/dhclient_wan.conf' -p '/var/run/dhclient.igc0.pid' 'igc0'' returned exit code '1', the output was 'igc0: no link .............. giving up'
2025-07-21T14:23:06 Notice kernel <6>igc0: link state changed to DOWN
2025-07-21T14:23:06 Notice kernel <6>igc0: link state changed to UP
2025-07-21T14:23:02 Notice kernel <6>igc0: link state changed to DOWN
2025-07-21T14:23:02 Notice kernel <6>igc0: link state changed to UP
2025-07-21T14:22:55 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '70234''(pid:/var/run/dhclient.igc0.pid)  returned exit code '1', the output was 'kill: 70234: No such process'
2025-07-21T14:22:55 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for wan(igc0)
2025-07-21T14:22:55 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '70234''(pid:/var/run/dhclient.igc0.pid)  returned exit code '1', the output was 'kill: 70234: No such process'
2025-07-21T14:22:55 Warning opnsense /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
...
2025-07-21T14:22:00 Notice dhclient dhclient-script: Reason REBOOT on igc0 executing
2025-07-21T14:21:59 Notice kernel <6>igc0: link state changed to UP
2025-07-21T14:21:58 Error dhclient send_packet: Network is down
2025-07-21T14:21:57 Error dhclient send_packet: Network is down
2025-07-21T14:21:56 Notice kernel <6>igc0: link state changed to DOWN
2025-07-21T14:21:56 Notice kernel <6>igc0: link state changed to UP
2025-07-21T14:21:55 Error dhclient send_packet: Network is down
2025-07-21T14:21:53 Error dhclient send_packet: Network is down
2025-07-21T14:21:52 Notice kernel <6>igc0: link state changed to DOWN
2025-07-21T14:21:52 Notice dhclient dhclient-script: Reason PREINIT on igc0 executing
2025-07-21T14:21:51 Notice kernel <6>igc0: link state changed to UP
2025-07-21T14:21:45 Notice kernel <6>igc0: link state changed to DOWN
2025-07-21T14:21:44 Notice kernel <6>igc0: link state changed to UP
2025-07-21T14:21:41 Notice kernel <6>igc0: link state changed to DOWN
2025-07-21T14:21:41 Notice kernel <6>igc0: link state changed to UP
2025-07-21T14:21:41 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for wan(igc0)
2025-07-21T14:21:29 Critical dhclient exiting.
2025-07-21T14:21:29 Error dhclient connection closed
2025-07-21T14:21:29 Warning opnsense /usr/local/etc/rc.linkup: radvd_configure_do(auto) found no suitable IPv6 address on lan(igc1)
2025-07-21T14:21:28 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : radvd_configure_dhcp(,inet6,[lan]))
2025-07-21T14:21:28 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure(,inet6,[lan]))
2025-07-21T14:21:28 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (,inet6,[lan])
2025-07-21T14:21:28 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for wan(igc0)
2025-07-21T14:21:28 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure newwanip:rfc2136 (,[wan])
2025-07-21T14:21:27 Notice kernel <6>igc0: link state changed to DOWN

Since I've used Copilot with varying succes in the past, I would really appreciere if someone could "qualify" the reply I've got. Here is the breakdown from Copilot. I realize it's not fair to ask for a specific reply with the lack of insight, but most of you probably have an opinion on the reply from Copilot and maybe someone has been in similar situation.

🔍 What the Logs Reveal (Crash Patterns)
Across nearly every incident:

And in several logs:

🧠 Root Cause Themes
Root CauseFrequencyExample Trigger
Link-State FlappingHigh igc0: link state changed to DOWN
DHCP Reinitialization Failure High send_packet: Network is down
Plugin Execution Race Medium plugins_configure dns, ipsec
Early Client DNS Requests Medium Resets from PC during bootup
Unbound Restart Mid-Recovery Medium unbound_configure_do()

✅ What You Can Do
1. Tame the Interface
2. Stagger Plugin Execution
3. Isolate Company Devices
4. Fortify DHCP Recovery
5. Monitor Client Behavior

💡 Next Steps
If you want, I can build:

I will appreciate any help provided.
Title: Re: Random loss of WAN conectivity
Post by: meyergru on July 22, 2025, 04:45:14 PM
Obviously, your WAN link is flapping. If you are correct in "ruling your ISP" out, I gues this does not happen when you use another router.

Quoting AI advice without any knowledge about your topology is not very helpful. BTW: Is you OpnSense behind a router or does it build up the connection by itself over a modem/ONT?

Since Proteclis usually have no hardware problems and I guess that the pure link is not the culprit here (since it looks like an Intel NIC), without further hints, I would guess that your ISP is one of those who terminate the connection when they see illegal traffic (like RFC1918 IPs as source address which have not been NATed). Such packets may occur when your local devices use addresses that should be visible only over the VPN and/or your routes are set incorrectly.

You could try to pinpoint that by creating a WAN outbound block rule (this is one of the rare occasions they are useful) with RFC1918 as source.
Title: Re: Random loss of WAN conectivity
Post by: 9ck on July 22, 2025, 05:32:55 PM
My OPNsense is behind a modem/router provided by my IPS. The ISP provided router has been set to bridge mode. They do not detect any issues on their side when I loose my WAN connection.

Do you refer to the physical cable and sockets whith "the pure link" (you'll have to excuse but English isn't my 1st language)?

I was sure that I was allowing only non-RFC1918 traffic going to the WAN, but going through my rules I do indeed see that in VLAN2SEC I allow all (*) going to WAN. This is the VLAN where I have our company PCs on (the ones using the company VPN service - out of my control). Could this be my issue?


Title: Re: Random loss of WAN conectivity
Post by: 9ck on July 22, 2025, 05:36:54 PM
This is the principle used on my other interfaces.
Title: Re: Random loss of WAN conectivity
Post by: 9ck on July 22, 2025, 05:43:27 PM
Quote from: meyergru on July 22, 2025, 04:45:14 PMYou could try to pinpoint that by creating a WAN outbound block rule (this is one of the rare occasions they are useful) with RFC1918 as source.

Not sure I understand this correctly. Wouldn't such a rule block all my outbound traffic? What would I look for? Should this maybe reveal if its my company VPN ip address causing issues?
Title: Re: Random loss of WAN conectivity
Post by: meyergru on July 22, 2025, 06:19:24 PM
No, the blocking rule would block only packets with a source IP within RFC 1918 to "any", but on WAN "out" direction. You can even log that rule to see if it matches.

Normally, your LAN packets (which are with RFC 1918, too), would be rewritten via NAT to originate from your WAN IP, so that such a rule would not apply for this kind of legitimate traffic.

Illegal traffic can occur when any of your clients use RFC1918 IPs that should normally be routed over your VPN, but by mistake reach your OpnSense, because it is your default gateway. OpnSense will then use its own default gateway (at the ISP) to send those packets to. If the latter are not NATed (which they probably are not, because no such rule exists), they would leave your OpnSense. The ISP router cannot handle these packets, because it knows that they could never be answered by anybody (being RFC1918, thus not routeable on the internet).

Then, it is the ISP's choice to drop such packets. However, some ISPs think this is a hacking attempt and drop your whole connection.
Title: Re: Random loss of WAN conectivity
Post by: 9ck on July 22, 2025, 06:33:25 PM
Thanks for the explanation - I'll try to digest. I've set the rule up (but also changed the mistake I had in the VLAN2SEC rule). I guess I'll have to wait and see if something shows up in the logs or if this does the trick. Hope I've understood the outbound block rule correct.

Could this also be the reason that I loose my connection to my LAN via Wireguard after 24h or so (from outside my LAN obviously)?
Title: Re: Random loss of WAN conectivity
Post by: meyergru on July 22, 2025, 06:42:12 PM
Since you did not check the "Log" box, nothing will be logged.

A drop after 24h can be because of a forced reconnect by your ISP, this is common in Germany, e.g.
Title: Re: Random loss of WAN conectivity
Post by: 9ck on July 22, 2025, 06:50:11 PM
Quote from: meyergru on July 22, 2025, 06:42:12 PMSince you did not check the "Log" box, nothing will be logged.
A drop after 24h can be because of a forced reconnect by your ISP, this is common in Germany, e.g.
Ahh... I though everything would show in the Log Files > Live View - probably not the best place to try to track things.
After the drop I can not reconnect to my LAN via Wireguard. Would that be the case if it was a forced reconnectivity issue? Sorry if I'm not being informative enough.
EDIT: Don't waste time on my Wireguard-issue. I just recalled that the local machine I was trying to access had crashed while I was away. I need to do more thurough testing in order to give you the correct picture.
Title: Re: Random loss of WAN conectivity
Post by: meyergru on July 22, 2025, 06:59:27 PM
Probably, if your IP changes during the process and the other side does not detect that it should reconnect - Wireguard does not do that per default. There is a cron job to detect stale connections in OpnSense.
Title: Re: Random loss of WAN conectivity
Post by: 9ck on July 22, 2025, 07:53:18 PM
Looking into this I see that I've enabled another cronjob that restarts Wireguard after 6 hours in order to refresh the public IP. Would you recommend I keep this? I have a dynamic ip address. In reality it is only being renegotiated if connection is down for 3 hours or more. Wonder if I've done this correct. Shouldn't it be the Dynamic DNS settings that I refresh?

Any recommendation as to how often I should run the job that will renew the DNS for Wireguard?