[Resolved] DNS and WEBgui intermittent despite reinstalls and hardware change

Started by novalocke, April 30, 2025, 12:01:59 AM

Previous topic - Next topic
Hello,

I've been having seemingly very niche issues with my OPNsense router & firewall. Randomly, on average a few times a day, DNS resolution stops working and I can no longer access the web GUI for between 30 seconds and 15 minutes at a time, roughly. Some days it may not happen, some days it adds up to several hours. During this time, direct network pings still usually work to external networks like 8.8.8.8, but sometimes they fail as well, and Dig returns "Connection Refused" when trying to connect to the DNS server specified by the router over DHCP. Despite ping working, DNS resolution fails to servers other than unbound as well.

I am using a Protectli V1410 Vault with an SSD installed, running OPNsense 25.1.5_5. I have a second subnet on a different port for some servers, and this issue seems to happen on specific ports at a time. What I mean is, when my main LAN is having issues, the other network is still working completely fine. It is much less common this occurs on my other network, but when it does, it is independent of my main LAN. As a result, I have tried changing what port my LAN is on, and it made no change.

This issue starts happening randomly from immediately after installation having made no configuration changes, including before configuring DNS or my second subnet, and seems to happen completely independent of any kind of configuration. I have attached OPNsense to my logging stack, and I was getting no errors corresponding with this until I enabled logging SERVFAILs in Unbound in OPNsense. Attached is a picture of common error patterns in Grafana over the past 7 days.

I have tried reinstalling OPNsense many times, as well as swapping ethernet cables, trying three separate network switches, disconnecting the TP-link router I'm using for wifi with DHCP disabled, replacing the SSD, running memtest86, and completely replacing the firewall after troubleshooting with Protectli. I have tried swapping the power supply and outlet used, and monitoring power supplied to the vault I see no abnormality when this occurs. I have gone through a few updates since this started, and all audits come back fine in OPNsense.

I am very confused, and any help or ideas would be appreciated.

Thank you!


I am using a 2008 Atom mini-pc with 2gb ddr2 and having no issues whatsoever. Your box may be defective. Is this a cheapo made-in-China soapbox router? I am not familiar with this class of devices so pardon my ignorance. I prefer discontinued brand-name equipment to any new made-in-China stuff any time of day.

OP said Protectli V1410 Vault so not exactly no-name.
OP - This one seems to have intel interfaces so that rules out realtek problems straight away.
So although hardware problems could be the problem, it is IMO more likely to be configuration/application side. That means diagnostics are required and we can't guess your setup and its configuration.
The part I see of interest to begin with is "trying three separate network switches, disconnecting the TP-link router I'm using for wifi with DHCP disabled, replacing the SSD, running memtest86, and completely replacing the firewall after troubleshooting with Protectli. "
Please describe in detail this configuration setup.
In any cases, check dmesg and tell the DNS setup in your network, including OPN settings related to it.

Didn't think to check dmesg, turns out the TrueNAS server I have on my main network was trying to use the same IP address as my router for some reason.
arp: [MAC Adress] is using my IP address 192.168.1.1 on igc1!Haven't had any problems since resolving this, thank you very much!

Nice. Yes I tend to start with dmesg when diagnosing.
IP address conflicts are usually dhcp dynamic/static overlaps with ones directly assigned on clients.