Large Alias Causing CPU spikes and ping latency

CanIKipThis · February 04, 2025, 02:58:25 PM

Hey everyone,

Tracked down a problem that seems its been there at least since 24.19. If you have "large" aliases groups, the firewall will have periodic CPU spikes and periods of erratic raised latency through it. I noticed this on now 3 different firewalls, initially they were configured with MaxMind GEO IP blocks. I had configured an alias that had the US, Canada and GB in it. This caused all three firewalls to act similar with latencies both to the firewall as well as through it to internet resources had really high latency periods. Here is a graph showing latencies to the firewall and through it to an endpoint while this condition occurred:

Firewall A:

https://imgur.com/a/oy4HB9t

Firewall A to 1.1.1.1:

https://imgur.com/a/BKk7h8F

Here is firewall B:

https://imgur.com/a/IoSYz4K

So what I did to troubleshoot was to delete any of the GEO IP aliases. You can see in Firewall A how it responded, both ping times evened out (notice the red square)

https://imgur.com/a/hx9jOj3

I even did a control experiment where I enabled Crowdsec (which creates an alias) and you can see the latency started to crawl back up (noted by the red arrow in that picture)

I checked crontab, and there is a job that runs every minute with update_tables.py in it. It seems some other people are reporting somewhat similar issues:

https://forum.opnsense.org/index.php?topic=41759.60#msg211036

Like I said it's happening across 3 different firewalls, with 3 different hardware setups at 3 different locations. It seems to be related to OPNSense. As a test I swapped out OPNsense with pfsense and it did not have the same latency spikes.

Any idea's or help?

CanIKipThis · February 10, 2025, 10:33:32 PM

Just trying to bump this up.

jphylips · February 11, 2025, 06:13:43 PM

Hi,

This behavior is present for a long time now, please see this:

https://forum.opnsense.org/index.php?topic=31662.msg153060#msg153060

I ended up with a workaround because I could not find the root of the problem.

CanIKipThis · March 12, 2025, 07:42:47 PM

Thanks, can you share what your workaround was?

franco · March 12, 2025, 07:58:11 PM

We have this one in the pipeline for other reasons, but it could help?

# opnsense-patch https://github.com/opnsense/core/commit/81ec98007d

Cheers,
Franco

jphylips · March 13, 2025, 04:19:47 PM

Thanks for the patch. Processing time reduced from somewhere between 15/20 seconds to under 7 seconds:

root@OPNsense:/usr/local/opnsense/scripts/filter # time /usr/local/opnsense/scripts/filter/update_tables.py
{"status": "ok"}
6.810u 4.850s 0:12.08 96.5%   159+171k 0+2io 0pf+0w

As for the workaround. It depends on the presence of a temporary file called: /tmp/refreshaliases
This file is created by a custom script called: /opt/local/bin/refreshaliases.sh
Contents of the script is:

#!/bin/sh

if [ $(wc -c /usr/local/opnsense/scripts/filter/update_tables.py|awk '{print $1}') -gt 100 ]
then
   mv /usr/local/opnsense/scripts/filter/update_tables.py /opt/local/bin
   cp /opt/local/bin/update_tables.py_new /usr/local/opnsense/scripts/filter/update_tables.py
   /usr/local/bin/rsync -a --delete /usr/local/opnsense/scripts/filter/lib /opt/local/bin/
fi
if [ $(drill www.google.com|grep ^www.google.com|wc -l) -ne 0 ]
then
   /usr/local/bin/flock -n -E 0 -o /tmp/filter_update_tables.lock /opt/local/bin/update_tables.py > /dev/null
   touch /tmp/refreshaliases
fi

Furthermore a new python script was created that does nothing, called: /opt/local/bin/update_tables.py_new
Contents of the script is:

#!/usr/local/bin/python3

"""
dummy
"""

Then a monit job was created that checks whether a config change has occurred and calls the /opt/local/bin/refreshaliases.sh script.
At boot the /opt/local/bin/refreshaliases.sh script must be run as well since the /tmp/refreshaliases file is not present at boot time.

Result: no more CPU spikes but aliases are refreshed at any config change. Hope this helps.

franco · March 13, 2025, 04:35:18 PM

Apparently this still lacks a bit of context: type of box, number of aliases, total size of them?

Thanks,
Franco

jphylips · March 13, 2025, 07:28:31 PM

Hi Franco,

You are absolutely right. Please find the answers to your questions below:

It's Protectli:
# dmidecode | grep "Product Name"|uniq
Product Name: VP2420
According to their website: Intel Celeron® J6412 Quad Core at 2 GHz (Burst up to 2.6 GHz)

There are about 161 aliases:
# grep "alias uuid" /conf/config.xml|wc -l
161

In total all aliases sum up to about 5.5 million. The larger ones are based on IP adresses from AbuseIP, FireHOL and about 5 large GEOIP based alias lists.

If you want me to trace anything please let me know, I will be more than happy to assist.

CanIKipThis · March 18, 2025, 01:57:10 PM

I am happy to help as well.

Box is ProtectCLI VP240
Celeron J4125
Currently on 25.1

Here is my alias list (the majority are in the one GEOIP blocklist)

https://imgur.com/a/lf8ccph

For the patch, if I install it, and then upgrade to 25.1.3, do I have to re-install each time?

franco · March 18, 2025, 03:08:07 PM

At first glance Celeron CPUs are underwhelming for this task. a stretch to 500k maybe, but I wouldn't trust it with managing more entries than this.

Cheers,
Franco

meyergru · March 18, 2025, 03:26:37 PM

The running time is reduced by ~40% by the patch...

franco · March 18, 2025, 03:46:52 PM

While that is nice it mainly works around kernel crashes regarding pfctl reading table contents that suddenly changes while reading under a lock.

Cheers,
Franco

CanIKipThis · March 18, 2025, 04:35:53 PM

What does this scheduled task actually do?

franco · March 18, 2025, 04:40:57 PM

Manage alias updates. Downloading, comparing, making sure the data is up to date. More or less what you would it expect to do.

Cheers,
Franco

jphylips · March 18, 2025, 05:38:35 PM

OK having my 'amount' of aliases is a bit too much. Would the behavior as I have implemented the workaround using monit to trigger the alias update when a config change occurs be possible when a modification is done in the aliases screen in the UI?

Large Alias Causing CPU spikes and ping latency

CanIKipThis

February 04, 2025, 02:58:25 PM

CanIKipThis

February 10, 2025, 10:33:32 PM #1

jphylips

February 11, 2025, 06:13:43 PM #2

CanIKipThis

March 12, 2025, 07:42:47 PM #3

franco

March 12, 2025, 07:58:11 PM #4

jphylips

March 13, 2025, 04:19:47 PM #5

franco

March 13, 2025, 04:35:18 PM #6

jphylips

March 13, 2025, 07:28:31 PM #7

CanIKipThis

March 18, 2025, 01:57:10 PM #8

franco

March 18, 2025, 03:08:07 PM #9

meyergru

March 18, 2025, 03:26:37 PM #10

franco

March 18, 2025, 03:46:52 PM #11

CanIKipThis

March 18, 2025, 04:35:53 PM #12

franco

March 18, 2025, 04:40:57 PM #13

jphylips

March 18, 2025, 05:38:35 PM #14