PFstates constantly increasing, when viewed through the GUI - system freezes

Started by naetimus, October 30, 2025, 10:57:21 PM

Previous topic - Next topic
I am having a state table issue on my current OPNsense, but on only one of the two systems in a cluster. They are identical Dell R620s, except one has 128GB of RAM and the other has 96GB of RAM. The one that is currently primary is running perfectly. The one that is currently secondary, freezes every 9 hours of operation. The only fix is to reboot, something this box has done at least 8 times in the last week. They are running about 100mbps of sustained traffic to the internet using less than 5% of the available RAM and run at about 7% CPU load - most of which is Unbound.
The one that freezes shows a "(zone: pf states) PFstates limit reached" error across the frozen VGA connection.
The system is configured with adaptive timeouts, when the state table exceeds 3 million to start shortening the timeout, and a maximum of 5 million.
When viewing the state table through the terminal, (pfctl -s states | wc -l ), it shows the same number of states as the primary firewall - somewhere between 80,000 and 200,000. However, through the GUI (Firewall->Diagnostics->Statistics:info->state-table->current-entries), it shows and ever increasing number. This last time, it showed over 7 million before becoming inaccessible and freezing. My system has a maximum of state table size of over 13 million, which is the default given the RAM.
The primary (working) firewall does not show a discrepancy between the GUI and the terminal on state table sizes.
So far I have rebuilt the firewall from ISO and replaced the hardware with known working hardware. It does not change the behavior.
Any help to better troubleshoot this issue would be greatly appreciated.

I am running:
OPNsense 25.7.6-amd64
FreeBSD 14.3-RELEASE-p4
OpenSSL 3.0.18

If I look at the active states through the GUI (Firewall->Diagnostics->States), it shows the states totals that I expect (88k right now), as I'm running a constantly refreshing script that prints the current state table.
When I click "reset state table", I can see the state table drop to 0 and then start to rebuild back to the normal amount (about 100k). However, the state table through the GUI (Firewall->Diagnostics->Statistics:info->state-table->current-entries), it did not reset the state table that gets its data from. That continues to climb (1.6m right now).

Further weird behavior: While I was watching the pfctl state totals, they switched from not matching the GUI (Firewall->Diagnostics->Statistics) to matching that number. Here is the output of the simple script that prints the datetime before printing the state table size:
2025-10-30 18:31:49 states:  67414
2025-10-30 18:31:59 states:  68477
2025-10-30 18:32:09 states:  74562
2025-10-30 18:32:20 states:  77916
2025-10-30 18:32:30 states:  73089
2025-10-30 18:32:40 states:  72530
2025-10-30 18:32:50 states:  98496
2025-10-30 18:33:01 states:  125197
2025-10-30 18:33:14 states:  2822507
2025-10-30 18:33:27 states:  2831250
2025-10-30 18:33:40 states:  2837438
2025-10-30 18:33:54 states:  2852378
2025-10-30 18:34:07 states:  2865346
2025-10-30 18:34:20 states:  2876912

what do the graphs look like in Reporting: Health, then select Category: System, Subject: states....
OPNsense 25.7.5-amd64 running on ESXi 6.7 U2 VM, 4Gbytes RAM, 2 x vCPU
frr OSPF + eBGP, IDS, AdGuard Home, sftp-backup plugins. limited kea DHCP server deployment.

(I put in a cron job to reboot every 30 minutes so I won't lose access as it is a remote-to-me system, so more recent numbers stay much lower.)
This matches what I see with the pfctl state check - totals match the states that the primary FW has until it starts to count the ones that are initially hidden, but show up in the Firewall->Diagnostics->Statistics:info->state-table->current-entries.
There is a blip at 4:45pm Oct 30 where I manually cleared the states. It briefly lowered the pfctl state tracker level, but never lowered the Firewall->Diagnostics->Statistics:info->state-table->current-entries level.

Here is the output for a script that checks the pfctl states and the current entries in pfctl info from the shell:
2025-11-02 11:34:27 pfctl states:  52309     pfctl info current entries:  2700328
2025-11-02 11:34:38 pfctl states:  50585     pfctl info current entries:  2705780
2025-11-02 11:34:48 pfctl states:  51355     pfctl info current entries:  2711810
2025-11-02 11:34:58 pfctl states:  53868     pfctl info current entries:  2718267
2025-11-02 11:35:09 pfctl states:  52325     pfctl info current entries:  2723348
2025-11-02 11:35:19 pfctl states:  54837     pfctl info current entries:  2732943
2025-11-02 11:35:29 pfctl states:  55378     pfctl info current entries:  2741223
2025-11-02 11:35:39 pfctl states:  59680     pfctl info current entries:  2753400
2025-11-02 11:35:50 pfctl states:  64250     pfctl info current entries:  2766531
2025-11-02 11:36:00 pfctl states:  64567     pfctl info current entries:  2776886
2025-11-02 11:36:10 pfctl states:  66137     pfctl info current entries:  2788349
2025-11-02 11:36:24 pfctl states:  2797652     pfctl info current entries:  2797336
2025-11-02 11:36:37 pfctl states:  2804701     pfctl info current entries:  2806653
2025-11-02 11:36:50 pfctl states:  2815419     pfctl info current entries:  2817855
2025-11-02 11:37:04 pfctl states:  2822113     pfctl info current entries:  2822839
2025-11-02 11:37:17 pfctl states:  2832972     pfctl info current entries:  2851494
2025-11-02 11:37:30 pfctl states:  2884384     pfctl info current entries:  2889268
2025-11-02 11:37:44 pfctl states:  2918638     pfctl info current entries:  2937305
2025-11-02 11:37:57 pfctl states:  2954439     pfctl info current entries:  2955651
2025-11-02 11:38:11 pfctl states:  2965034     pfctl info current entries:  2963103
2025-11-02 11:38:24 pfctl states:  2980040     pfctl info current entries:  2976971
2025-11-02 11:38:38 pfctl states:  2984842     pfctl info current entries:  2987501
2025-11-02 11:38:51 pfctl states:  2987838     pfctl info current entries:  2991103
2025-11-02 11:39:05 pfctl states:  2995979     pfctl info current entries:  2997730
2025-11-02 11:39:18 pfctl states:  3004199     pfctl info current entries:  3007805

The pfctl states does not match the pfctl info until pfctl info current entries exceeds 2.7m. This happens consistently. My sense is that two things are going on:
1) The system is collecting some junk and entering it into the state table. Some process stops the system from recognizing these entries as regular state table entries, but eventually when they build up to a large enough total, the system stops suppressing them and starts listing them as active states.
2) There is an underlying bug in FreeBSD that prevents it from flushing these entries. They don't relate to any actual traffic streams, so they should be temporary at best, but never collect until they cause the system to freeze.