Primary firewall crashes in HA setup

Started by shade73, September 15, 2016, 04:40:21 PM

Previous topic - Next topic
We have two opnsense boxes in HA setup, and every 6-7 day the primary firewall "crashes" (it does not generate a crash report).

When it happens we notices it by our VPN users cannot get access, the primary have stopped processing trafic and does not accept connections (GUI/SSH cannot be accessed).

On the secundary firewall, in "Firewall / Virtual IPs / Status" shows that it have taken control of all 3 CARP interfaces as master. So all normal trafic flows.

The CLI/Console on the primary firewall is working, so the problem can be fixed by selecting reboot in the menu, then the primary firewall comes up agian, and resumes the CARP interfaces as master, and the secundary becomes slave.

6-7 days later, it starts over agian..

Hi Shade,

Doesn't sound normal, this would require a peek into the system, something like "top" to look at memory usage, dmesg for out-of-memory-kills, process list to check against.

7 days sounds suspiciously like a Cron job of some yet unknown variety. :)


Cheers,
Franco

7 days in average, it can happen after 4 days but also first after 10.

I will get a top and dmesg output next time it happens.

Now it happened again, we have been "lucky" with the updates to OPNsense - they reset the process when the firewall reboots.

Logs attached.