24.7 CPU Temps

Started by ProximusAl, July 26, 2024, 03:28:26 PM

Previous topic - Next topic
> hardwarecooling differences

;)

My proposal https://github.com/opnsense/core/commit/f473d9a5c7 got shot down because showing the "wrong" temperature once every 24 hours is apparently unacceptable. Maybe someone can tell me what is acceptable here.


Cheers,
Franco

One question, though ... sorry if I missed something.

The computationally expensive task that jacks up CPU temp while reporting seems to be sysctl -a | grep ...

Why is this necessary? Is there really no API that gives you the number of cores so you could poll only the existing OIDs? I wonder how other systems that I have in my zoo report temperatures like e.g. ESXi or any appliance I poll via SNMP. It just feels "wrong" to implement it this way.

Kind regards,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

August 28, 2024, 09:30:58 PM #62 Last Edit: August 28, 2024, 09:32:34 PM by doktornotor
Quote from: franco on August 28, 2024, 09:21:47 PM
My proposal https://github.com/opnsense/core/commit/f473d9a5c7 got shot down because showing the "wrong" temperature once every 24 hours is apparently unacceptable. Maybe someone can tell me what is acceptable here.

Well, FWIW, I already posted the ultimate solution...  8)

Quote from: doktornotor on August 28, 2024, 12:37:38 AM
Perhaps removing the widget would be the best course of action here.

Quote from: Patrick M. Hausen on August 28, 2024, 09:26:39 PM
Is there really no API that gives you the number of cores so you could poll only the existing OIDs?

If there was one, you'd still miss the other sensors. There are more sensors than CPU cores.

Or, hear me out, the CPU utilization issue seen here and elsewhere is the problem.


Quote from: doktornotor on August 28, 2024, 09:30:58 PM
If there was one, you'd still miss the other sensors. There are more sensors than CPU cores.
I am aware of that, yet would not necessarily expect a generic firewall appliance running on all kinds of hardware to display more than the CPU core avarage. Do that in a clean way and call it "feature complete"  ;D

Leave the rest to proper network management systems and SNMP.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Ok, so let's remove useful information because some people don't like the output of sysctl. SMH.

You have a valid point. Let's see what Franco and colleagues come up with. I guess you can do one sysctl -a at boot time, then poll only the sensors you found afterwards. They are not likely to change without a reboot involved one way or another.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

To be frank just try the patch:

# opnsense-patch f473d9a5c7 && service configd restart

As I said if we don't agree it's progress someone will need to come up with a better solution reading temperature readings from the sysctls which can only be probed during runtime. I cannot spend indefinite amounts of community time on pleasing the demand for lower temperatures.

The sysctl -a was there since the fork. We don't have to argue its downsides in every nuance.


Cheers,
Franco

August 29, 2024, 10:12:36 AM #69 Last Edit: August 29, 2024, 10:14:12 AM by sbellon
Not saying this is related, but I agree that I see a change in the graphs after having upgraded to 24.7.1 and then after 24.7.2 as well. For me it's not the CPU temperature because I'm running that on a Proxmox VE and don't have that available inside OPNsense, but I can see how the usage of the "States" clearly (!) changed with the upgrade from 24.1.10 to 24.7.1 and then again to 24.7.2 as you can see from my attached screenshot (upgrade to 24.7.1 was on 08.08.24 and upgrade to 24.7.2 was on 21.08.24 - both clearly visible in the graph without further explanation).

I am not saying this change is a problem nor worth investigating, I'm just saying that I can clearly see this change in behaviour and this may very well have effects on CPU usage and/or memory usage and perhaps as a result even CPU temperature.

Oh, and yes, configuration has NOT changed AT ALL over this period of time.

Followup:

I've found a workaround that reduces my CPU utilization and temps to pre-24.7 levels. The problem is:
/usr/local/bin/python3 /usr/local/opnsense/scripts/filter/update_tables.py

When I disable the Maxmind Geoblock aliases, the CPU temp drops by 10C and utilization from that process drops by 50%. If I reenable that alias, the temp and utilization jump back up.

I have not altered any of these aliases, and the table entries are consistent with what they were under 24.1. This leads me to believe there is still an underlying problem that needs to be identified (this just helps narrow it down).

Since this is a python script, and 24.7 brought us python 3.11 - Is it possible that python 3.11 is the underlying problem?

I think that there must be another problem with your setup. That script is run only when the aliases change.

For me, it does not run all the time, so it just cannot be responsible for any ongoing CPU load, even if it were less efficient than before. I may cause some spikes for the time it runs if large amounts of aliases are processed.

So, are you saying that "ps auxwww | fgrep update_tables" is running all the time? I think it clearly should not and for me, it does not - despite the fact that I also use the Maxmind geoip database for blocking.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

It's not running all the time, but approximately once a minute for a few seconds which is enough to effect temperature readings and the average CPU usage under health reporting.

September 02, 2024, 01:12:29 AM #73 Last Edit: September 02, 2024, 01:15:17 AM by meyergru
It clearly does not do that for me.

Perhaps you have too many aliases? Maybe because you did not follow the tips here, e.g. because you do not have enough firewall states and the alias database never gets fully processed, starting the process over again and again?

Something has to trigger that update...
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

I only have 5 aliases, and I followed the guides when I did my initial install. There is enough available, there are no errors in my logs, and the problems didn't arise until after the 24.7 install - which is why I could see the noticeable difference in health reporting.

I searched before posting, and there is a lot of people reporting that specific process running about once a minute over the years without anyone contradicting it as abnormal... Disabling the Geoblock alias also does not change the frequency that the process runs, it only changes the percentage of CPU utilization to pre-24.7 levels.