1
19.7 Legacy Series / Unresponsive every 2 weeks, fixed
« on: January 02, 2020, 04:38:01 pm »
This is a problem that started recent-ish following an upgrade, which I've resolved it (not a fix as such) so sharing with the community.
I'm running 19.7.8-amd64 on a Decisio appliance.
Recently it started becoming unresponsive every 2 weeks or so, unless we pro-actively reboot it. The firewall would fail to respond to all direct connections to it including DNS,HTTPS,SSH,Ping; but would continue to allow traffic between networks as normal (so long as no DNS lookup required). It sounded like a resource leak as being the root cause.
Looking at /var/log/system.log following such a failure event there were a lot of messages:
kernel: swap_page_getswapspace(): failed
kernel: swap_page_getswapspace(): failed
kernel: swap_page_getswapspace(): failed
Clearly the OS is running out of memory. Further monitoring of memory and processes in Reporting > Health > System highlighted a failry constant growth in the number of processes, as well as a steady increase in network latency in Reporting > Health > System.
Using the console and running top reported a lot of 'pinger' processes running under the 'squid' user account. I stopped squid service and killed off all 'pinger' proicesses and normal reliable service appears to have resumed.
I'm running 19.7.8-amd64 on a Decisio appliance.
Recently it started becoming unresponsive every 2 weeks or so, unless we pro-actively reboot it. The firewall would fail to respond to all direct connections to it including DNS,HTTPS,SSH,Ping; but would continue to allow traffic between networks as normal (so long as no DNS lookup required). It sounded like a resource leak as being the root cause.
Looking at /var/log/system.log following such a failure event there were a lot of messages:
kernel: swap_page_getswapspace(): failed
kernel: swap_page_getswapspace(): failed
kernel: swap_page_getswapspace(): failed
Clearly the OS is running out of memory. Further monitoring of memory and processes in Reporting > Health > System highlighted a failry constant growth in the number of processes, as well as a steady increase in network latency in Reporting > Health > System.
Using the console and running top reported a lot of 'pinger' processes running under the 'squid' user account. I stopped squid service and killed off all 'pinger' proicesses and normal reliable service appears to have resumed.