Memory/swap issues

Started by sesquipedality, December 13, 2020, 11:17:29 AM

Previous topic - Next topic
December 13, 2020, 11:17:29 AM Last Edit: December 13, 2020, 11:19:48 AM by sesquipedality
Hello all.  I am running into a nasty problem with current OpnSense which probably started sometime during the 20.7 release cycle (I certainly don't remember this happening before the 20.7 upgrade).  It is difficult to debug as the symptoms are that the router becomes unresponsive, and throws up errors due to killing processes because it is out of swap.  It seems like it might be some sort of memory leak, but that does not make much sense.  I am able to monitor memory usage easily because the router runs in a VM on a Dell R710 1u server, and I can see that of the 2G I have allocated only about 1.2G is in use by the VM even during this issue.   Normal memory usage as reported by OpnSense (not directly comparable to the usage reported outside the VM by by qemu) itself is about 500G in normal operation, 1.9G just after boot.  CARP is enabled, but does not kick in, presumably because the kernel level CARP interface remains functional even though the router itself does not.

I have checked the logs, and unfortunately the out of swap errors do not get logged.  In fact there seems to be nothing in the way of logging for at least 30 minutes before the router became unresponsive.  The only reference to swap I have in the logs is as follows:

2020-12-13T00:41:01 kernel warning: increase kern.maxswzone or reduce amount of swap.
2020-12-13T00:41:01 kernel warning: total configured swap (2097152 pages) exceeds maximum recommended amount (2002392 pages).


I thought that there was no swap configured on OpnSense, so I am confused by this.

The unresponsiveness usually occurs overnight, so if there is an intensive system job running overnight, this could be the cause.

Can anyone give me any pointers as to how to work out what is going wrong.  I have not tried to login on the console when the router is unresponsive, but the web interface does not respond at all.

Does anyone have any pointers on this. please?  Thanks.