No access to webui and ssh after 12-48 hours

Started by aschild, June 03, 2022, 03:04:07 PM

Previous topic - Next topic
Hello,

we had a 21.x FW running at a client site which was stable.

The customer did then move to a new location and we did upgrade the FW to 22.1 and also added a VLAN 100 on the internal network (LAN) port.

This works fine so far, but after 12-48 hours, it's no longer possible to correct to the webui, ssh and also incomming OpenVPN connections no loger work.

We did then replace the hardware and did restore the config from the other box.
But unfortunally the problem is also present on the new hardware.
Any idea how to track down that problem?
The logs are usually empty after a reboot.

If we login via ssh session after a reboot it looks on, but once the webui connections no longer work, the ssh connection is still open, I hat hit enter and it gives a new line, but staring a command like top just hangs the prompt

----------------------------------------------
|      Hello, this is OPNsense 22.1          |         @@@@@@@@@@@@@@@
|                                            |        @@@@         @@@@
| Website:      https://opnsense.org/        |         @@@\\\   ///@@@
| Handbook:     https://docs.opnsense.org/   |       ))))))))   ((((((((
| Forums:       https://forum.opnsense.org/  |         @@@///   \\\@@@
| Code:         https://github.com/opnsense  |        @@@@         @@@@
| Twitter:      https://twitter.com/opnsense |         @@@@@@@@@@@@@@@
----------------------------------------------

*** firewall.xxxxxx.local: OPNsense 22.1.8_1 (amd64/OpenSSL) ***

LAN (re0)       -> v4: 10.0.0.1/24
WAN (re2)       -> v4/DHCP4: xxxxxxx24
                    v6/DHCP6: xxxxxxxxxxxx/64
WLANGuest (vlan01) -> v4: 192.168.132.1/24


  0) Logout                              7) Ping host
  1) Assign interfaces                   8) Shell
  2) Set interface IP address            9) pfTop
  3) Reset the root password            10) Firewall log
  4) Reset to factory defaults          11) Reload all services
  5) Power off system                   12) Update from console
  6) Reboot system                      13) Restore a backup

Enter an option: 8

root@firewall:~ #
root@firewall:~ #
root@firewall:~ # top
^C



Also here a CTRL+C does not do anything...
The old box was with the ufs filesystem, on the new box we did use zfs...


We have found the probable cause, seems like there is some memory leak eating up all memory.
So I will search the formu for that issue, wehich seems to exist in some cases


Looks like it's related to the /var file system beeing mounted as RAM file system.
We did now disable most logging and told the system to use /var on the SDD instead.

Thanks for figuring that out. We are aware of the issue and I'm just working on https://github.com/opnsense/core/issues/5727 to be able to handle this better in 22.7.


Cheers,
Franco