Hello OPNsense,
I have two DEC2752 units configured in HA that are being used for a new remote office network build. These two units were purchased less than 1-year ago and the configuration on them is quite basic. No IPv6. No unbound/dnsmasq configuration. I've got an IPSEC vpn connection from the VIP addresses to my primary site. DHCP/DNS (at this time) is handled at my primary DC.
So the OPNsense firewall cluster is just acting as a secure gateway with a site to site tunnel. Nothing fancy.
Last week, I tried to connect to both units. Master (10.103.0.1) responds fine. Secondary (10.103.0.2) is sluggish. The web gui fails to load properly.
The IPSec VPN tunnel is functional.
When I started to investigate what is going on with the secondary unit, I see a ton of errors via CLI:
root@FW02:~ # swap_pager: out of swap space
swp_pager_getswapspace(10): failed
swp_pager_getswapspace(3): failed
swap_pager: out of swap space
swp_pager_getswapspace(4): failed
swap_pager: out of swap space
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(4): failed
swp_pager_getswapspace(6): failed
swap_pager: out of swap space
swp_pager_getswapspace(1): failed
swap_pager: out of swap space
swp_pager_getswapspace(2): failed
swap_pager: out of swap space
swp_pager_getswapspace(22): failed
swp_pager_getswapspace(20): failed
If I reboot the secondary firewall, the OS loading process seems slow. Once I get in and press 8 for CLI, I don't have much time before it starts to bog down.
running df -h I get:
root@FW02:~ # df -h
Filesystem Size Used Avail Capacity Mounted on
zroot/ROOT/default 222G 10G 212G 5% /
devfs 1.0K 0B 1.0K 0% /dev
/dev/gpt/efifs 256M 645K 255M 0% /boot/efi
zroot/tmp 212G 224K 212G 0% /tmp
zroot 212G 96K 212G 0% /zroot
zroot/var/log 212G 164M 212G 0% /var/log
zroot/var/audit 212G 96K 212G 0% /var/audit
zroot/usr/home 212G 96K 212G 0% /usr/home
zroot/usr/ports 212G 96K 212G 0% /usr/ports
zroot/usr/src 212G 96K 212G 0% /usr/src
zroot/var/crash 212G 96K 212G 0% /var/crash
zroot/var/mail 212G 144K 212G 0% /var/mail
zroot/var/tmp 212G 96K 212G 0% /var/tmp
devfs 1.0K 0B 1.0K 0% /var/dhcpd/dev
When I take a look at top -o res, I see high swap:
root@FW02:~ # top -o res
last pid: 13953; load averages: 4.39, 2.72, 1.50 up 0+00:10:36 15:54:20
63 processes: 25 running, 38 sleeping
CPU: 53.4% user, 0.0% nice, 44.4% system, 2.2% interrupt, 0.0% idle
Mem: 5721M Active, 702M Inact, 195M Laundry, 754M Wired, 2056K Buf, 493M Free
ARC: 257M Total, 184M MFU, 66M MRU, 610K Anon, 1329K Header, 5222K Other
224M Compressed, 324M Uncompressed, 1.44:1 Ratio
Swap: 8418M Total, 6101M Used, 2317M Free, 72% Inuse, 314M In
swap_pager: out of swap spaceiled
swp_pager_getswapspace(10): failedIZE RES STATE C TIME WCPU COMMAND
85082ager_getswapspace(48: faile 824M 555M RUN 2 0:04 32.18% php
7616pager: out of swap42paceile1069M 490M RUN 2 0:14 30.59% php-cgi
5271ager_getswapspace(48): faile488M 393M CPU0 0 0:02 24.03% php-cgi
28393 root 1 21 0 548M 392M select 2 0:15 0.01% php-cgi
5177 root 1 48 0 494M 391M RUN 1 0:02 31.91% php
9260 root 1 24 0 584M 386M select 2 0:06 0.00% php-cgi
91350 root 1 50 0 516M 368M RUN 0 0:03 6.38% php
2079 root 1 42 0 440M 338M RUN 2 0:03 43.99% php-cgi
6068 root 1 24 0 538M 291M RUN 3 0:02 9.70% php-cgi
2260 root 1 44 0 751M 270M RUN 3 0:09 9.23% php-cgi
63351 root 1 24 0 726M 266M RUN 3 0:03 7.69% php
28926 root 1 20 0 634M 236M select 3 0:16 0.00% php-cgi
8504 root 1 20 0 584M 236M select 2 0:06 0.00% php-cgi
1583 root 1 24 0 792M 224M CPU1 1 0:07 23.80% php-cgi
I have tried to clean up some logs that I had in /var/log and reboot but that didn't help.
These are the only packages I have installed:
root@FW02:~ # pkg info | grep os-
os-OPNBEcore-1.7_3 OPNsense Business Edition add-ons
os-OPNcentral-1.12_2 OPNsense central management
os-dmidecode-1.2 Display hardware information on the dashboard
os-etpro-telemetry-1.8 ET Pro Telemetry Edition
What I'm struggling to understand is why would my primary unit be working just fine and my secondary having this issue. I have been evaluating OPNsense as a use case for our remote site(s) but my configurations seem a bit light.
I do enable logging on my firewall rules but I don't have many rules at all.
I have a total of 11 VHIDs and on my primary unit at this time, my swap is 0.0%, memory used is 1047mb/arc 1103mb and my disk utilization is 1%.
When checking my snapshot, I see that bectl list shows default as 10.4G.
root@FW02:~ # bectl list
BE Active Mountpoint Space Created
default NR / 10.4G 2025-04-17 09:23
When I compare that to my primary/active unit, it shows 1.29G
My concern here is that this is some kind of hardware failure but I'm not sure how to confirm that or check.
The web interface is unresponsive that I can't even go in and create a recent backup. The page for backups won't load. I should have a latest backup but I'm just pointing out as to how locked up the interface is.
I can't recall what was done in the past 1-3 weeks but it wouldn't be much. These firewalls are waiting for me to rebuild a new IPSEC vpn connection from my primary location so I haven't performed any recent configuration changes to them to my knowledge.
I've captured screenshots and I can get further logs from the POST sequence and OS bootup if it helps.
Thank you,