Hey, folks.
I did my periodic update/upgrade this week and since doing so I've been seeing very high CPU usage by lighttpd. I noticed because ZenArmor reporting and other reporting interfaces are struggling to render in the admin interface. I've had some odd behavior related to ZA database in the past so I reset the database to fresh and rebooted but I'm still seeing high CPU and laggy interface.
The system load is ~ 1.5 or so so not terrible.
[1] CPU: Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz (3691.56-MHz K8-class CPU)
hw.ncpu: 4
hw.physmem: 8432918528
Thoughts?
There have been 2 performance regressions in corner cases for larger config.xml model data. These are to be fixed in 25.7.4 this week.
Cheers,
Franco
franco,
When you get a chance can you let me know what sizes might qualify as "large"? I'll then confirm fix after upgrade if I'm in the "large" category.
thx
It's not necessarily about size, but the number of model-relation references in the config.xml.
Just let me know if 25.7.4 is back to normal again. It's due later today.
Cheers,
Franco
I just updated and rebooted and lighttpd is still idling north of 90% even without any connections to the management interface.
I'm not at liberty to share a lot of detail but I have 3 10Gb interfaces, relatively few custom fw rules, but 2 outbound VPN tunnels (Wireguard). Also a mostly stock ZenArmor config.
The CPU is more than 10 years old. With only 2 physical cores this is not a performant chip compared to current standards.
Especially when you throw multiple 10Gb connections and a lot of traffic, including Zenarmor at it.
Could it be that it just cant handle the load?
4 CPU system. Big pipes but not a lot of traffic. Very edge-environment. I have been thinking about replacing it but that's another conversation. I've been running OPN + ZenArmor on it for years and never seen lighttpd red line the CPU for days.
I'm just digging around now to see if FBSD has an equiv to strace/ltrace/eBPF. Not seeing anything in the logs.
EDIT: 2 CPUs; 4 cores, I guess.
Are you sure lighttpd is not silently processing stray requests? This can e.g. happen when the admin interface is exposed to the WAN. But then again we have a few different lighttpd processes so it would be nice to know which one it is. The slow admin GUI is likely circumstantial.
Cheers,
Franco
@granute I am a lighttpd developer and would be interested in seeing the `truss -a -f -D -s 1024 -p "<lighttpd_pid>"` output of the lighttpd process.
https://man.freebsd.org/cgi/man.cgi?truss
Back in Jan, there was https://forum.opnsense.org/index.php?topic=44391.msg226740#msg226740 and I posted some steps to troubleshoot, so you might take a look at that thread. However, that thread concluded that the CPU issue was due to processing interrupts.
Back in 2021, there was a post about an issue in lighttpd that caused high CPU use, and it was fixed in lighttpd 1.4.59 (released Nov 2020)
https://forum.opnsense.org/index.php?topic=21210.msg101803#msg101803