OPNsense Forum

English Forums => 25.1, 25.4 Production Series => Topic started by: CanIKipThis on February 04, 2025, 02:58:25 PM

Title: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on February 04, 2025, 02:58:25 PM
Hey everyone,

Tracked down a problem that seems its been there at least since 24.19.  If you have "large" aliases groups, the firewall will have periodic CPU spikes and periods of erratic raised latency through it.  I noticed this on now 3 different firewalls, initially they were configured with MaxMind GEO IP blocks.  I had configured an alias that had the US, Canada and GB in it.  This caused all three firewalls to act similar with latencies both to the firewall as well as through it to internet resources had really high latency periods.  Here is a graph showing latencies to the firewall and through it to an endpoint while this condition occurred:

 Firewall A:

https://imgur.com/a/oy4HB9t

Firewall A to 1.1.1.1:

https://imgur.com/a/BKk7h8F

Here is firewall B:

https://imgur.com/a/IoSYz4K

So what I did to troubleshoot was to delete any of the GEO IP aliases.  You can see in Firewall A how it responded, both ping times evened out (notice the red square)

https://imgur.com/a/hx9jOj3

I even did a control experiment where I enabled Crowdsec (which creates an alias) and you can see the latency started to crawl back up (noted by the red arrow in that picture)

I checked crontab, and there is a job that runs every minute with update_tables.py in it. It seems some other people are reporting somewhat similar issues:

https://forum.opnsense.org/index.php?topic=41759.60#msg211036 (https://forum.opnsense.org/index.php?topic=41759.60#msg211036)

Like I said it's happening across 3 different firewalls, with 3 different hardware setups at 3 different locations.  It seems to be related to OPNSense.  As a test I swapped out OPNsense with pfsense and it did not have the same latency spikes. 

Any idea's or help?
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on February 10, 2025, 10:33:32 PM
Just trying to bump this up. 
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: jphylips on February 11, 2025, 06:13:43 PM
Hi,

This behavior is present for a long time now, please see this:

https://forum.opnsense.org/index.php?topic=31662.msg153060#msg153060

I ended up with a workaround because I could not find the root of the problem.
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on March 12, 2025, 07:42:47 PM
Thanks, can you share what your workaround was?
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: franco on March 12, 2025, 07:58:11 PM
We have this one in the pipeline for other reasons, but it could help?

# opnsense-patch https://github.com/opnsense/core/commit/81ec98007d


Cheers,
Franco
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: jphylips on March 13, 2025, 04:19:47 PM
Thanks for the patch. Processing time reduced from somewhere between 15/20 seconds to under 7 seconds:

root@OPNsense:/usr/local/opnsense/scripts/filter # time /usr/local/opnsense/scripts/filter/update_tables.py
{"status": "ok"}
6.810u 4.850s 0:12.08 96.5%   159+171k 0+2io 0pf+0w

As for the workaround. It depends on the presence of a temporary file called: /tmp/refreshaliases
This file is created by a custom script called: /opt/local/bin/refreshaliases.sh
Contents of the script is:

#!/bin/sh

if [ $(wc -c /usr/local/opnsense/scripts/filter/update_tables.py|awk '{print $1}') -gt 100 ]
then
   mv /usr/local/opnsense/scripts/filter/update_tables.py /opt/local/bin
   cp /opt/local/bin/update_tables.py_new /usr/local/opnsense/scripts/filter/update_tables.py
   /usr/local/bin/rsync -a --delete /usr/local/opnsense/scripts/filter/lib /opt/local/bin/
fi
if [ $(drill www.google.com|grep ^www.google.com|wc -l) -ne 0 ]
then
   /usr/local/bin/flock -n -E 0 -o /tmp/filter_update_tables.lock /opt/local/bin/update_tables.py > /dev/null
   touch /tmp/refreshaliases
fi

Furthermore a new python script was created that does nothing, called: /opt/local/bin/update_tables.py_new
Contents of the script is:

#!/usr/local/bin/python3

"""
    dummy
"""

Then a monit job was created that checks whether a config change has occurred and calls the /opt/local/bin/refreshaliases.sh script.
At boot the /opt/local/bin/refreshaliases.sh script must be run as well since the /tmp/refreshaliases file is not present at boot time.

Result: no more CPU spikes but aliases are refreshed at any config change. Hope this helps.
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: franco on March 13, 2025, 04:35:18 PM
Apparently this still lacks a bit of context: type of box, number of aliases, total size of them?


Thanks,
Franco
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: jphylips on March 13, 2025, 07:28:31 PM
Hi Franco,

You are absolutely right. Please find the answers to your questions below:

It's Protectli:
# dmidecode | grep "Product Name"|uniq
   Product Name: VP2420
According to their website: Intel CeleronĀ® J6412 Quad Core at 2 GHz (Burst up to 2.6 GHz)

There are about 161 aliases:
# grep "alias uuid" /conf/config.xml|wc -l
     161

In total all aliases sum up to about 5.5 million. The larger ones are based on IP adresses from AbuseIP, FireHOL and about 5 large GEOIP based alias lists.

If you want me to trace anything please let me know, I will be more than happy to assist.
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on March 18, 2025, 01:57:10 PM
I am happy to help as well.

Box is ProtectCLI VP240
Celeron J4125
Currently on 25.1

Here is my alias list (the majority are in the one GEOIP blocklist)

https://imgur.com/a/lf8ccph

For the patch, if I install it, and then upgrade to 25.1.3, do I have to re-install each time?



Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: franco on March 18, 2025, 03:08:07 PM
At first glance Celeron CPUs are underwhelming for this task. a stretch to 500k maybe, but I wouldn't trust it with managing more entries than this.


Cheers,
Franco
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: meyergru on March 18, 2025, 03:26:37 PM
The running time is reduced by ~40% by the patch...
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: franco on March 18, 2025, 03:46:52 PM
While that is nice it mainly works around kernel crashes regarding pfctl reading table contents that suddenly changes while reading under a lock.


Cheers,
Franco
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on March 18, 2025, 04:35:53 PM
What does this scheduled task actually do? 
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: franco on March 18, 2025, 04:40:57 PM
Manage alias updates. Downloading, comparing, making sure the data is up to date. More or less what you would it expect to do.


Cheers,
Franco
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: jphylips on March 18, 2025, 05:38:35 PM
OK having my 'amount' of aliases is a bit too much. Would the behavior as I have implemented the workaround using monit to trigger the alias update when a config change occurs be possible when a modification is done in the aliases screen in the UI?
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: franco on March 18, 2025, 05:44:21 PM
We were testing with this regarding the kernel panic issue:

# opnsense-patch https://github.com/opnsense/core/commit/c8497ac14603

It needs a cron apply or reboot.  It will only reload the aliases twice per hours at the expense of the update interval. It has downsides, but we were discussion making it configurable for some edge cases like this.


Cheers,
Franco
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: guenti_r on March 24, 2025, 08:05:44 AM
Hi franco,

it works! Many thanks! Please make the reload configurable in future.
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on March 24, 2025, 03:54:10 PM
Thanks how do you do a cron apply after the install?  I have installed the patch yet the cron for that job still is configured for every hour.
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: guenti_r on March 24, 2025, 03:58:59 PM
System -> Settings -> Cron -> "apply"
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on March 24, 2025, 06:24:28 PM
Thanks! Did you see a difference for the update.py job in cron after the patch was installed?
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: guenti_r on March 25, 2025, 07:22:19 AM
Yes, the minute latency spikes are gone.
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: jphylips on March 25, 2025, 12:49:16 PM
But I guess they now occur every half hour?
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: franco on March 25, 2025, 01:26:39 PM
That would be the working theory. ;)
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on March 25, 2025, 01:56:31 PM
That I get, but I'm just wondering where the logic changed because my crontab still says every minute:

(https://imgur.com/a/JVpIQAp)

This was after installing the patch and then rebooting.  Did the same thing again and did the crontab apply.
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: guenti_r on March 25, 2025, 02:04:01 PM
No, you are wrong.

1,31    *       *       *       *       (/usr/local/bin/flock -n -E 0 -o /tmp/filter_update_tables.lock /usr/local/opnsense/scripts/filter/update_tables.py) > /dev/null

Use crontab -e  and not the GUI
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: CanIKipThis on March 25, 2025, 02:32:03 PM
Thanks thats where I was checking from.  I had installed it at least two times before.  Just did it again, followed cron apply and now it is showing the change.

Thanks for following up!
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: jphylips on March 26, 2025, 02:11:37 PM
Hi Franco,
Thanks for the patch. Great work (as always, and I mean that in a very positive way!!!).
Would it be possible to make the execution moments of the cron job customizable?
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: guenti_r on March 27, 2025, 03:56:26 PM
@franco

is this patch included in 25.1.4 now?
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: jphylips on March 28, 2025, 02:48:35 PM
Judging by this output, I don't think so:

root@OPNsense:~ # crontab -l|grep update_tables
*   *   *   *   *   (/usr/local/bin/flock -n -E 0 -o /tmp/filter_update_tables.lock /usr/local/opnsense/scripts/filter/update_tables.py) > /dev/null
root@OPNsense:~ #
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: franco on March 28, 2025, 02:54:43 PM
No, because it breaks MAC table updates by taking up to 30 minutes to render these changes. We're likely going to make this configurable, but it's not an immediate concern. Just change the values on your end for now.

What ships in 25.1.4 is the performance improvement of the task previously discussed here.


Cheers,
Franco
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: guenti_r on April 18, 2025, 09:00:18 AM
Quote from: franco on March 18, 2025, 05:44:21 PMWe were testing with this regarding the kernel panic issue:

# opnsense-patch https://github.com/opnsense/core/commit/c8497ac14603

It needs a cron apply or reboot.  It will only reload the aliases twice per hours at the expense of the update interval. It has downsides, but we were discussion making it configurable for some edge cases like this.


Cheers,
Franco

Is this patch compatible with 25.4 ?
We really need this to made it configurable. In the meantime, the 1/31 would help us alot.
One of our customer has some problems with the constant minutely latency spikes with VoIP.

Here we have similar aliases (about 150) with average 3 million entries.
Reducing these to 50% does not help, the latency spikes are still there.

OPNSense 25.4 / HP DL 360 G9 / Xeon E5 2690v4 256 GB RAM
Title: Re: Large Alias Causing CPU spikes and ping latency
Post by: jphylips on April 18, 2025, 09:28:49 AM
Hi,
I've put my hack in a PDF for other to use. Please see attached. It works for me and the CPU load is now totally gone.