24.7 CPU Temps

franco · August 28, 2024, 09:21:47 PM

> ~~hardware~~cooling differences

;)

My proposal https://github.com/opnsense/core/commit/f473d9a5c7 got shot down because showing the "wrong" temperature once every 24 hours is apparently unacceptable. Maybe someone can tell me what is acceptable here.

Cheers,
Franco

Patrick M. Hausen · August 28, 2024, 09:26:39 PM

One question, though ... sorry if I missed something.

The computationally expensive task that jacks up CPU temp while reporting seems to be sysctl -a | grep ...

Why is this necessary? Is there really no API that gives you the number of cores so you could poll only the existing OIDs? I wonder how other systems that I have in my zoo report temperatures like e.g. ESXi or any appliance I poll via SNMP. It just feels "wrong" to implement it this way.

Kind regards,
Patrick

doktornotor · August 28, 2024, 09:30:58 PM

Quote from: franco on August 28, 2024, 09:21:47 PM
My proposal https://github.com/opnsense/core/commit/f473d9a5c7 got shot down because showing the "wrong" temperature once every 24 hours is apparently unacceptable. Maybe someone can tell me what is acceptable here.

Well, FWIW, I already posted the ultimate solution... 8)

Quote from: doktornotor on August 28, 2024, 12:37:38 AM
Perhaps removing the widget would be the best course of action here.

Quote from: Patrick M. Hausen on August 28, 2024, 09:26:39 PM
Is there really no API that gives you the number of cores so you could poll only the existing OIDs?

If there was one, you'd still miss the other sensors. There are more sensors than CPU cores.

irrenarzt · August 28, 2024, 10:03:05 PM

Or, hear me out, the CPU utilization issue seen here and elsewhere is the problem.

doktornotor · August 28, 2024, 10:12:16 PM

Quote from: irrenarzt on August 28, 2024, 10:03:05 PM
Or, hear me out, the CPU utilization issue seen here and elsewhere is the problem.

GOTO https://forum.opnsense.org/index.php?topic=41759.msg210341#msg210341

Patrick M. Hausen · August 28, 2024, 10:12:55 PM

Quote from: doktornotor on August 28, 2024, 09:30:58 PM
If there was one, you'd still miss the other sensors. There are more sensors than CPU cores.

I am aware of that, yet would not necessarily expect a generic firewall appliance running on all kinds of hardware to display more than the CPU core avarage. Do that in a clean way and call it "feature complete" ;D

Leave the rest to proper network management systems and SNMP.

doktornotor · August 28, 2024, 10:18:13 PM

Ok, so let's remove useful information because some people don't like the output of sysctl. SMH.

Patrick M. Hausen · August 28, 2024, 10:26:51 PM

You have a valid point. Let's see what Franco and colleagues come up with. I guess you can do one sysctl -a at boot time, then poll only the sensors you found afterwards. They are not likely to change without a reboot involved one way or another.

franco · August 28, 2024, 11:22:54 PM

To be frank just try the patch:

# opnsense-patch f473d9a5c7 && service configd restart

As I said if we don't agree it's progress someone will need to come up with a better solution reading temperature readings from the sysctls which can only be probed during runtime. I cannot spend indefinite amounts of community time on pleasing the demand for lower temperatures.

The sysctl -a was there since the fork. We don't have to argue its downsides in every nuance.

Cheers,
Franco

sbellon · August 29, 2024, 10:12:36 AM

Not saying this is related, but I agree that I see a change in the graphs after having upgraded to 24.7.1 and then after 24.7.2 as well. For me it's not the CPU temperature because I'm running that on a Proxmox VE and don't have that available inside OPNsense, but I can see how the usage of the "States" clearly (!) changed with the upgrade from 24.1.10 to 24.7.1 and then again to 24.7.2 as you can see from my attached screenshot (upgrade to 24.7.1 was on 08.08.24 and upgrade to 24.7.2 was on 21.08.24 - both clearly visible in the graph without further explanation).

I am not saying this change is a problem nor worth investigating, I'm just saying that I can clearly see this change in behaviour and this may very well have effects on CPU usage and/or memory usage and perhaps as a result even CPU temperature.

Oh, and yes, configuration has NOT changed AT ALL over this period of time.

irrenarzt · September 02, 2024, 12:06:57 AM

Followup:

I've found a workaround that reduces my CPU utilization and temps to pre-24.7 levels. The problem is:
/usr/local/bin/python3 /usr/local/opnsense/scripts/filter/update_tables.py

When I disable the Maxmind Geoblock aliases, the CPU temp drops by 10C and utilization from that process drops by 50%. If I reenable that alias, the temp and utilization jump back up.

I have not altered any of these aliases, and the table entries are consistent with what they were under 24.1. This leads me to believe there is still an underlying problem that needs to be identified (this just helps narrow it down).

Since this is a python script, and 24.7 brought us python 3.11 - Is it possible that python 3.11 is the underlying problem?

meyergru · September 02, 2024, 01:02:09 AM

I think that there must be another problem with your setup. That script is run only when the aliases change.

For me, it does not run all the time, so it just cannot be responsible for any ongoing CPU load, even if it were less efficient than before. I may cause some spikes for the time it runs if large amounts of aliases are processed.

So, are you saying that "ps auxwww | fgrep update_tables" is running all the time? I think it clearly should not and for me, it does not - despite the fact that I also use the Maxmind geoip database for blocking.

irrenarzt · September 02, 2024, 01:06:17 AM

It's not running all the time, but approximately once a minute for a few seconds which is enough to effect temperature readings and the average CPU usage under health reporting.

meyergru · September 02, 2024, 01:12:29 AM

It clearly does not do that for me.

Perhaps you have too many aliases? Maybe because you did not follow the tips here, e.g. because you do not have enough firewall states and the alias database never gets fully processed, starting the process over again and again?

Something has to trigger that update...

irrenarzt · September 02, 2024, 01:30:14 AM

I only have 5 aliases, and I followed the guides when I did my initial install. There is enough available, there are no errors in my logs, and the problems didn't arise until after the 24.7 install - which is why I could see the noticeable difference in health reporting.

I searched before posting, and there is a lot of people reporting that specific process running about once a minute over the years without anyone contradicting it as abnormal... Disabling the Geoblock alias also does not change the frequency that the process runs, it only changes the percentage of CPU utilization to pre-24.7 levels.

24.7 CPU Temps

franco

August 28, 2024, 09:21:47 PM #60

Patrick M. Hausen

August 28, 2024, 09:26:39 PM #61

doktornotor

August 28, 2024, 09:30:58 PM #62 Last Edit: August 28, 2024, 09:32:34 PM by doktornotor

irrenarzt

August 28, 2024, 10:03:05 PM #63

doktornotor

August 28, 2024, 10:12:16 PM #64

Patrick M. Hausen

August 28, 2024, 10:12:55 PM #65

doktornotor

August 28, 2024, 10:18:13 PM #66

Patrick M. Hausen

August 28, 2024, 10:26:51 PM #67

franco

August 28, 2024, 11:22:54 PM #68

sbellon

August 29, 2024, 10:12:36 AM #69 Last Edit: August 29, 2024, 10:14:12 AM by sbellon

irrenarzt

September 02, 2024, 12:06:57 AM #70

meyergru

September 02, 2024, 01:02:09 AM #71

irrenarzt

September 02, 2024, 01:06:17 AM #72

meyergru

September 02, 2024, 01:12:29 AM #73 Last Edit: September 02, 2024, 01:15:17 AM by meyergru

irrenarzt

September 02, 2024, 01:30:14 AM #74