Lose WAN connectivity every few days

Started by Rob_H, May 22, 2022, 02:26:32 PM

Previous topic - Next topic
Hi all,

I'm having an issue in which I lose WAN connectivity seemingly at random every few days. OPNsense web GUI and SSH are still accessible and WAN interface link is up with both IPv4 and IPv6 addresses. In fact, everything looks fine in the web console other than no traffic getting in from or out to the Internet. I try pinging google.com by name and IP address from interface diagnostics, and it doesn't work. Also, tried releasing/renewing WAN IP and restarting various services. Nothing. Eventually I have to reboot OPNsense to get connectivity back.

I looked at various logs under /var/system and found nothing obvious except in dmesg where I see a bunch of messages like:

cannot forward from 2601:190:402:89d1:9465:5aff:fee3:9bd4 to fe80:b::c4c7:78ff:fe35:1b3b nxt 58 received on bridge0
cannot forward from fe80:b::603a:46ff:fefd:b9e4 to fe80:b::6892:2aff:fe98:7c53 nxt 58 received on bridge0
cannot forward from fe80:b::603a:46ff:fefd:b9e4 to fe80:b::6892:2aff:fe98:7c53 nxt 58 received on bridge0
cannot forward from 2601:190:402:89d1:9465:5aff:fee3:9bd4 to fe80:b::c4c7:78ff:fe35:1b3b nxt 58 received on bridge0
cannot forward from fe80:b::9465:5aff:fee3:9bd4 to fe80:b::c4c7:78ff:fe35:1b3b nxt 58 received on bridge0
[...]


Seems like this is a symptom, though, not a cause.

Any suggestions on things to try or places to look next?

Do you have a diagram available?  Can you explain your configuration?  Do you have interfaces bridged in pfsense?  If so, do they plug into the same vlan in your switch?  (or if your switch doesn't support vlans, that may be your problem right there).


It's a pretty simple setup: LAN is a bridge consisting of 5 physical interfaces (em1-em5). WAN is em0 and uses DHCPv4 and v6 to get network config from the upstream ISP. I am not using VLANs.

You could test with not using a bridge but a single LAN interface and, if you need the ports, an external switch.  TPLink makes inexpensive switches that work extremely well for things like that.

If the problem goes away, it would seem to be some issue with the bridge.  If it comes back, at least we know that it's probably not due to the bridge and you can try one interface after the other to see if a particular interface is faulty.

It could be a problem with the gateway of your ISP, but when rebooting OPNsense fixes the problem, it seems somewhat unlikely.  You could try with another router ...

Have you tried to unplug the cable that goes to your ISP?  Does that fix the problem instead of rebooting?

I am having the same problem as @rob_h. Every few days the no internet but everything appears to be working. Its been rock solid for months until I updated to the latest version 22.1.7. Reboot brings it back up. Not sure what logs etc would be best to provide.


Single WAN and Single LAN interface (followed by TP Link Omada switch and EAPs.
DHCP
NAT forwarding of DNS to Adguard (Plugin) and then to unbound on :5353

Would love to identify the culprit as we both work from home.

Maybe you are using similar hardware?

Quote from: defaultuserfoo on May 25, 2022, 12:29:47 PM
Maybe you are using similar hardware?

Hardware is a Hunsn fanless mini-PC like this one. Brand new. I can't say if the upgrade 22.1.7 was a factor in my case because I've only had it for a month so I was probably on that version from the beginning.

I will try unplugging/reconnecting the WAN cable the next time it happens. I'd prefer not to change the bridge config, since I don't really want another piece of hardware there, but assuming it's the same root cause for @BillyMcSkintos it sounds like that won't make a difference anyway.

May 31, 2022, 09:17:47 PM #7 Last Edit: May 31, 2022, 09:22:41 PM by BillyMcSkintos
Mine is a similar Qotom Device but not the same.

Just went down again. "Problem Files" attached.

Do you use a pppoe connetion?  If so, maybe there's something in the ppoe logs ...


@defauluserfoo: I believe so. Are the logs on the firewall or the modem?
@tracerrx: I am not understanding how the referenced thread related to the drivers. However, I did disable IPS. I am not MAC spoofing.

I have been running OPNSense for years with no problems. Recent changes include the latest version and a shift to running AdGuard as a plugin vs Raspberry Pi. Have not found evidence that adGuard is the cause but its on the list...

@BillyMcSkintos you need to read through the entire thread, there are multiple possible causes... I was referring to this reply in particular however: https://forum.opnsense.org/index.php?topic=27299.msg137350#msg137350

@tracerrx Thank you for direct guidance, much appreciated. Followed the steps from your post and quoted below but get error when executing the make command:
make: "/usr/share/mk/bsd.sysdir.mk" line 15: Unable to locate the kernel source tree. Set SYSDIR to override.
Quote from: tracerrx on May 09, 2022, 04:36:01 PM
From the command line of your opnsense box:
pkg install git
pkg install wget
cd /usr
git clone https://github.com/opnsense/plugins
git clone https://github.com/opnsense/ports
git clone https://github.com/opnsense/src
cd src/
git checkout
git checkout stable/22.1
cd /tmp
wget https://downloadmirror.intel.com/682705/igb-2.5.21.tar.gz
tar xzf igb-2.5.21.tar.gz
cd igb-2.5.21/src
make
cp if_igb.ko /boot/modules/if_igb_updated.ko


From the opnsense GUI:
System=>Settings=>Tunables
Tunable => if_igb_updated_load
Value => YES


You need to reboot the opnsense box for the changes to take effect.. afterwords when you run sysctl -a | grep dev.igb you should see the new driver version

Make sure you git checkout stable/22.1 the correct version that you are running


Confirmed.
xxxxxxx@opnsense:~/src # git checkout stable/22.1