very high cpu usage since upgrade to 25.7.3_7

Started by granute, September 28, 2025, 05:37:24 PM

Previous topic - Next topic
Hey, folks.

I did my periodic update/upgrade this week and since doing so I've been seeing very high CPU usage by lighttpd. I noticed because ZenArmor reporting and other reporting interfaces are struggling to render in the admin interface. I've had some odd behavior related to ZA database in the past so I reset the database to fresh and rebooted but I'm still seeing high CPU and laggy interface.

The system load is ~ 1.5 or so so not terrible.

[1] CPU: Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz (3691.56-MHz K8-class CPU)
hw.ncpu: 4
hw.physmem: 8432918528

Thoughts?

There have been 2 performance regressions in corner cases for larger config.xml model data. These are to be fixed in 25.7.4 this week.


Cheers,
Franco

franco,

When you get a chance can you let me know what sizes might qualify as "large"? I'll then confirm fix after upgrade if I'm in the "large" category.

thx

It's not necessarily about size, but the number of model-relation references in the config.xml.

Just let me know if 25.7.4 is back to normal again. It's due later today.


Cheers,
Franco

I just updated and rebooted and lighttpd is still idling north of 90% even without any connections to the management interface.

I'm not at liberty to share a lot of detail but I have 3 10Gb interfaces, relatively few custom fw rules, but 2 outbound VPN tunnels (Wireguard). Also a mostly stock ZenArmor config.

September 30, 2025, 06:37:22 PM #5 Last Edit: September 30, 2025, 06:44:48 PM by Kets_One
The CPU is more than 10 years old. With only 2 physical cores this is not a performant chip compared to current standards.
Especially when you throw multiple 10Gb connections and a lot of traffic, including Zenarmor at it.
Could it be that it just cant handle the load?
Deciso dec3840: EPYC Embedded 3101, 16GB RAM, 512GB NVMe

September 30, 2025, 09:25:58 PM #6 Last Edit: September 30, 2025, 09:31:43 PM by granute
4 CPU system. Big pipes but not a lot of traffic. Very edge-environment. I have been thinking about replacing it but that's another conversation. I've been running OPN + ZenArmor on it for years and never seen lighttpd red line the CPU for days.

I'm just digging around now to see if FBSD has an equiv to strace/ltrace/eBPF. Not seeing anything in the logs.

EDIT: 2 CPUs; 4 cores, I guess.

Are you sure lighttpd is not silently processing stray requests? This can e.g. happen when the admin interface is exposed to the WAN. But then again we have a few different lighttpd processes so it would be nice to know which one it is. The slow admin GUI is likely circumstantial.


Cheers,
Franco

@granute I am a lighttpd developer and would be interested in seeing the `truss -a -f -D -s 1024 -p "<lighttpd_pid>"` output of the lighttpd process.
https://man.freebsd.org/cgi/man.cgi?truss

Back in Jan, there was https://forum.opnsense.org/index.php?topic=44391.msg226740#msg226740 and I posted some steps to troubleshoot, so you might take a look at that thread.  However, that thread concluded that the CPU issue was due to processing interrupts.

Back in 2021, there was a post about an issue in lighttpd that caused high CPU use, and it was fixed in lighttpd 1.4.59 (released Nov 2020)
https://forum.opnsense.org/index.php?topic=21210.msg101803#msg101803