Random freeze of the firewall

Started by golfvert, January 03, 2019, 01:44:41 PM

Previous topic - Next topic
Hello,
I have been running opnsense (18.7.9) for a couple of months now. I have random freeze of the box. It can work for a day or less, a week and then, all of a sudden the system just hangs. No routing/firewalling. No response to ping. Only hard reboot "solve" the issue. I, first, suspected a hardware problem. However, I have ran a stress test on the box for more than a week without any issue.
I am running a rather simple configuration with very little plugin actives (I have removed everything that is not essential for my setup). I am running the latest version on a https://b2b.gigabyte.com/Embedded-Computing/EL-20-3050-32GB-rev-20 with a LAN and a WAN interface.
Opnsense is my DHCP server.
I have looked at various log files without any luck.
When I do a "top" on the box, it runs at 40% CPU and the heavier processes are a couple of "php-cgi"
Is there any debug mode I can activate in order to understand what's going on?
I can post log files if it helps...
Thank you for any clues you can have!!
GV

Hello there,

Out of curiosity, can you SSH into he box when it freeze? It might not have frozen? Stress test doesn't tell the whole picture, it could Hard Disk failure?

Regards

Hi,
Not 100% sure I tried to ssh but ping definitely fails. So, I doubt ssh will work. I will wait for next freeze and (re)check. When I reboot the eMMC is clean and works just fine.

I have had disk failures before with Linux, Linux is happy to continue until. It's not OOM?   

You mean Out Of Memory? I don't know for sure as I can't see anything before the freeze in the log. At the moment, I have more than 3M available... I have tried using telegraf to monitor any trend on CPU/MEM/... no luck either. It just seems to "happen" randomly and I don't know where to look after. The active log files at freeze time appears to be erased at the reboot by the new ones.
Thanks for helping.

Not a Freebsd guy, but no dmesg before the freeze? You have to consider hardware failure? But you can try different FreeBSB kernels, that's beyond my knowlege.  Discliamer: This is beyond my know how at this point

Nothing significant in dmesg.
Regarding hardware issues, running for days a heavy stress tests and having fsck the disk, I (almost) rule out this.
Changing Freebsd kernel is out of my knowledge too!!


January 03, 2019, 02:38:17 PM #8 Last Edit: January 03, 2019, 02:45:49 PM by golfvert
Already done. Memory just fine. Ran the stress test for a week without any issue!

How about connecting a monitor once it freezes to see what's going on? Perhaps it's the NIC drivers that are "locked up", Realtek is known for that.

Memory test takes 2 days at least, 48 hours. Bit flips don't happen in that small time span

Quote from: mimino on January 03, 2019, 02:54:03 PM
How about connecting a monitor once it freezes to see what's going on? Perhaps it's the NIC drivers that are "locked up", Realtek is known for that.

Good idea. Not going to be easy (not a lot of space where the firewall is installed)  but worth a try!
If Realtek is know for that is there a fix?

Quote from: bugsmanagement on January 03, 2019, 02:54:41 PM
Memory test takes 2 days at least, 48 hours. Bit flips don't happen in that small time span

I ran the mem test a full week. No issue. So I am rather confident the memory is ok.

Quote from: golfvert on January 03, 2019, 03:09:56 PM
Quote from: mimino on January 03, 2019, 02:54:03 PM
How about connecting a monitor once it freezes to see what's going on? Perhaps it's the NIC drivers that are "locked up", Realtek is known for that.

Good idea. Not going to be easy (not a lot of space where the firewall is installed)  but worth a try!
If Realtek is know for that is there a fix?

The only fix I know of is to avoid it like the plague.
Try to find the exact chipset and google for known problems in freebsd, that might give you a clue.

Quote from: mimino on January 03, 2019, 03:41:40 PM
Quote from: golfvert on January 03, 2019, 03:09:56 PM
Quote from: mimino on January 03, 2019, 02:54:03 PM
How about connecting a monitor once it freezes to see what's going on? Perhaps it's the NIC drivers that are "locked up", Realtek is known for that.

Good idea. Not going to be easy (not a lot of space where the firewall is installed)  but worth a try!
If Realtek is know for that is there a fix?

The only fix I know of is to avoid it like the plague.
Try to find the exact chipset and google for known problems in freebsd, that might give you a clue.

Looking at past posts it seems that the newest driver version (1.95) is or was OKish. Do you know if there is a way to restart the ethernet interface without rebooting? I can watchdog the interfaces and if something is wrong do a "service netif restart" or similar. That would be faster than rebooting... My freebsd skills are close to nill.