Sudden intermittent NAT(?) issues

Started by nakedcreep, November 11, 2024, 09:44:32 AM

Previous topic - Next topic
Hello,

I'm new to opnSENSE and recently got a simple failover multi wan setup done on a Minisforum MS-01, I'm using XCP-NG and passing through the SFPs on the MS-01 to the opnSENSE vm, one of which is being used at the moment with multiple vlans to split LANs, WANs, etc.

Everything has been working flawlessly for over a month, I have a handful of rules and some port forwarding for my 4 servers that are behind this, everything else was pretty much default configuration however after a brief downtime with my main ISP now I've got some intermittent connections issues, some examples:

- sshing from the internet into my machines would time out for some of them and I would be able to get through only after retrying 1-2 times
- a simple curl -4 ifconfig.me/ip works every now and then, there's always a connection to the server however sometimes no reply is received
- IPv6 has completely stopped working behind NAT and on the opnSENSE box itself even though it gets the IPs from all my 3 WANs
- pinging some external hosts works from some machines while it doesn't from others

Where I'm lost is that I've started by just checking if the packets pass through the opnSENSE box and they do perfectly fine, take for example ICMP, I can see the request going through perfectly fine for server1 and server2, however just one would get a reply while the other doesn't. (there's no firewall at all enabled on the servers themselves and gateways/netmasks are set fine and haven't changed in forever), that happens for both external requests as well as the gateway itself.


Any suggestions on how to debug this further would be much appreciated!

Did you disable hardware checksum offloading, once in OPNsense in Interfaces: Settings and also in XCP-NG?

They are disabled yes, I believe by default, xcp-ng shouldn't be interfering as the interfaces are passed through.

Just an update, I've found in the logs something about a "Malicious Driver Detection event", I've rebooted the box and got a lot of "tracing" messages spammed before it actually rebooted (never seen those before) and now it seems to just work better...

Can't find much about this, is there any way to disable this malicious driver detection at all?

I've tried various sysctl settings, haven't seen the "Malicious Driver Detection event" in a while but I'm still losing connection every now and then and it's really frustrating.. it seems that this "event" is triggered by constant high traffic, but can't explain the other issues, it might just be that it doesn't play well with DDR5 (or my particular kit) as my DDR4 machine has never had a single issue and I've set both up at the same time.

Still hoping for suggestions on how to improve this, maybe someone runs a similar setup successfully?