PFstates constantly increasing, when viewed through the GUI - system freezes

Started by naetimus, October 30, 2025, 10:57:21 PM

Previous topic - Next topic
I am having a state table issue on my current OPNsense, but on only one of the two systems in a cluster. They are identical Dell R620s, except one has 128GB of RAM and the other has 96GB of RAM. The one that is currently primary is running perfectly. The one that is currently secondary, freezes every 9 hours of operation. The only fix is to reboot, something this box has done at least 8 times in the last week. They are running about 100mbps of sustained traffic to the internet using less than 5% of the available RAM and run at about 7% CPU load - most of which is Unbound.
The one that freezes shows a "(zone: pf states) PFstates limit reached" error across the frozen VGA connection.
The system is configured with adaptive timeouts, when the state table exceeds 3 million to start shortening the timeout, and a maximum of 5 million.
When viewing the state table through the terminal, (pfctl -s states | wc -l ), it shows the same number of states as the primary firewall - somewhere between 80,000 and 200,000. However, through the GUI (Firewall->Diagnostics->Statistics:info->state-table->current-entries), it shows and ever increasing number. This last time, it showed over 7 million before becoming inaccessible and freezing. My system has a maximum of state table size of over 13 million, which is the default given the RAM.
The primary (working) firewall does not show a discrepancy between the GUI and the terminal on state table sizes.
So far I have rebuilt the firewall from ISO and replaced the hardware with known working hardware. It does not change the behavior.
Any help to better troubleshoot this issue would be greatly appreciated.

I am running:
OPNsense 25.7.6-amd64
FreeBSD 14.3-RELEASE-p4
OpenSSL 3.0.18

If I look at the active states through the GUI (Firewall->Diagnostics->States), it shows the states totals that I expect (88k right now), as I'm running a constantly refreshing script that prints the current state table.
When I click "reset state table", I can see the state table drop to 0 and then start to rebuild back to the normal amount (about 100k). However, the state table through the GUI (Firewall->Diagnostics->Statistics:info->state-table->current-entries), it did not reset the state table that gets its data from. That continues to climb (1.6m right now).

Further weird behavior: While I was watching the pfctl state totals, they switched from not matching the GUI (Firewall->Diagnostics->Statistics) to matching that number. Here is the output of the simple script that prints the datetime before printing the state table size:
2025-10-30 18:31:49 states:  67414
2025-10-30 18:31:59 states:  68477
2025-10-30 18:32:09 states:  74562
2025-10-30 18:32:20 states:  77916
2025-10-30 18:32:30 states:  73089
2025-10-30 18:32:40 states:  72530
2025-10-30 18:32:50 states:  98496
2025-10-30 18:33:01 states:  125197
2025-10-30 18:33:14 states:  2822507
2025-10-30 18:33:27 states:  2831250
2025-10-30 18:33:40 states:  2837438
2025-10-30 18:33:54 states:  2852378
2025-10-30 18:34:07 states:  2865346
2025-10-30 18:34:20 states:  2876912

what do the graphs look like in Reporting: Health, then select Category: System, Subject: states....
OPNsense 25.7.5-amd64 running on ESXi 6.7 U2 VM, 4Gbytes RAM, 2 x vCPU
frr OSPF + eBGP, IDS, AdGuard Home, sftp-backup plugins. limited kea DHCP server deployment.

(I put in a cron job to reboot every 30 minutes so I won't lose access as it is a remote-to-me system, so more recent numbers stay much lower.)
This matches what I see with the pfctl state check - totals match the states that the primary FW has until it starts to count the ones that are initially hidden, but show up in the Firewall->Diagnostics->Statistics:info->state-table->current-entries.
There is a blip at 4:45pm Oct 30 where I manually cleared the states. It briefly lowered the pfctl state tracker level, but never lowered the Firewall->Diagnostics->Statistics:info->state-table->current-entries level.