OPNsense Forum

English Forums => 24.7, 24.10 Legacy Series => Topic started by: ProximusAl on July 26, 2024, 03:28:26 PM

Title: 24.7 CPU Temps
Post by: ProximusAl on July 26, 2024, 03:28:26 PM
It may just be me, but has anyone noticed the thermals increased with 24.7?

Pre upgrade I was about 38C, and now I'm at 45C. Nothing else has changed.
Title: Re: 24.7 CPU Temps
Post by: RutgerDiehard on July 26, 2024, 03:47:26 PM
I've not seen any difference in temperatures. The attached is what's been captured by Netdata for the last 2 days. The 24.7 upgrade was performed around 15:30 yesterday (indicated by the red mark).
Title: Re: 24.7 CPU Temps
Post by: ProximusAl on July 26, 2024, 05:35:56 PM
This is also odd. Since upgrading the table reporting CPU temp under health is recording 0

(https://i.ibb.co/tpgM4Vc/IMG-0632.jpg)
Title: Re: 24.7 CPU Temps
Post by: Seimus on July 26, 2024, 05:43:31 PM
Works for me fine, and as well I don't see differences in temps.

From perspective of CPU usage & resources I don't see any significant extra resource util by the new Dashboard.

Regards,
S.
Title: Re: 24.7 CPU Temps
Post by: ProximusAl on July 26, 2024, 05:58:02 PM
Interesting.

I reset my RRD data and it is now working.....but temp still about 7C higher
Title: Re: 24.7 CPU Temps
Post by: mgl on July 26, 2024, 07:03:22 PM
Same here, when I remove the widget and add it again it shows the correct temp for a short time, after some time all cores (0-3) are displayed and here one core is always displayed too high, then it jumps back to only one core and then the temp is no longer correct.

"configctl system temp" shows approx. 5 degrees difference.
Which corresponds exactly to what the "old" dashboard displayed
Title: Re: 24.7 CPU Temps
Post by: ProximusAl on July 26, 2024, 07:18:11 PM
Hey...you're right.

I ran that command on mine and it says 38C but the widget shows 45C.

I think this is a bug in the widget perhaps?
Title: Re: 24.7 CPU Temps
Post by: jorisvervuurt on July 26, 2024, 07:28:55 PM
I'm seeing the same this; the widget shows different values than the command above does.
Perhaps the widget shows an average of a specific duration?
Title: Re: 24.7 CPU Temps
Post by: mgl on July 26, 2024, 08:31:35 PM
Strange is the reporting health shows the right values 39 - 41 degrees - widget shows 45- 47.
Had to reset RRD data, before it shows 0
Title: Re: 24.7 CPU Temps
Post by: ProximusAl on July 26, 2024, 08:57:11 PM
Yes....I can confirm, health data accurate, widget is not.
Title: Re: 24.7 CPU Temps
Post by: ProximusAl on July 26, 2024, 09:05:26 PM
I should add CPU is N100 just in case that matters 😀
Title: Re: 24.7 CPU Temps
Post by: mgl on July 26, 2024, 09:24:58 PM
Same CPU here - N100
Title: Re: 24.7 CPU Temps
Post by: MenschAergereDichNicht on July 26, 2024, 11:06:47 PM
I see the following:

- The widget shows 63°C
- The RRD health data shows an average of ~53°C
- "sysctl -a | grep temperature" shows 50°C
- "sysctl dev.cpu | grep temparature" shows 43°C

Not all executed at the exact same point in time but after several tries the trend is stable.

If i compare the visualization of the CPU-Widget with the output of "top" it looks also quite different. But maybe this is just the sampling intervall.

System: Celeron J6412 and thermal sensor configuration set to use the on-die thermal sensors.
Title: Re: 24.7 CPU Temps
Post by: Baender on July 26, 2024, 11:20:42 PM
I noticed that, too. I have an INTEL N5105. Before 24.7, the temperature was around 48-53°C. On 24.7, the widget shows up to 60°C.
Title: Re: 24.7 CPU Temps
Post by: franco on July 27, 2024, 10:18:20 AM
We've always had the issue that querying the temperature had such a high resolution that a query through the GUI showed higher temps due to the processing required instead of dispatching a single command on the idle shell (while the GUI wasn't doing anything).

We may have to write a small tool to fetch the temperatures in the background away from the GUI so when the GUI query comes in it reads the actual value, not the one while the CPU is busy processing the user request..?


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: Seimus on July 27, 2024, 10:34:46 AM
Now when you are speaking about the widget you are right, the widget shows higher temp. I Assumed all are talking about temps in RDD or in CLI.

Franco, yes, fetching it that way would be better, cause the Widget gives a bit misleading info.

Regards,
S.
Title: Re: 24.7 CPU Temps
Post by: franco on July 27, 2024, 11:22:20 AM
Admittedly it's a bit counter-productive when you think about it when these temperature spikes are reported, because you fetch them when they happen, because they happen when you fetch them. ;)


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: MenschAergereDichNicht on July 27, 2024, 07:44:04 PM
Quote from: franco on July 27, 2024, 10:18:20 AM
We may have to write a small tool to fetch the temperatures in the background away from the GUI so when the GUI query comes in it reads the actual value, not the one while the CPU is busy processing the user request..?

Cheers,
Franco

Maybe you could just use the RRD data if it is available.

And if i understood your problem description it might be a good idea to use "sysctl dev.cpu" instead of "sysctl -a" for the RRD data.
Title: Re: 24.7 CPU Temps
Post by: franco on July 29, 2024, 08:49:43 AM
> Maybe you could just use the RRD data if it is available.

Not a great plan because the RRD backend needs a full rewrite.

> And if i understood your problem description it might be a good idea to use "sysctl dev.cpu" instead of "sysctl -a" for the RRD data.

RRD is not even using sysctl so not understood. It's actually using a tool that really really needs to be removed for the same reason that RRD backend needs a full rewrite.

Just trying to give a perspective. Guessing problems into open source is a bit taxing from a dev point of view because now it's not enough to be open it needs to be explained constantly...


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: Seimus on July 29, 2024, 02:09:44 PM
Listening about rewrites and seeing some.

This back a question. In a long run do you plan a complete graphical rewrites and overhauls for all aspect of OPNsense in time? :)

Regards,
S.
Title: Re: 24.7 CPU Temps
Post by: franco on July 29, 2024, 02:47:15 PM
Was initially hoping it would take 10 years so we would be almost done, but it's safe to say it may take up to 5 more years.

This includes API/MVC for everything user-facing as well as full privilege separation for the GUI.


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: Seimus on July 29, 2024, 02:50:24 PM
Great!

Is actually awesome to hear you are still doing this. No matter the time frame, this is still awesome to hear :)

Thanks Franco.

Regards,
S.
Title: Re: 24.7 CPU Temps
Post by: MenschAergereDichNicht on July 29, 2024, 07:57:15 PM
Hi Franco,

i guess i could try harder to explain myself.

> Guessing problems into open source is a bit taxing from a dev point of view because now it's not enough to be open it needs to be explained constantly

First of all i understand that it is sometimes tiring to explain things over and over. In my case i actually think you have a point because i *could* look into the sources and get some insights or be more precise with my statements. But you can't really expect this from every random person.

> RRD is not even using sysctl so not understood. It's actually using a tool that really really needs to be removed for the same reason that RRD backend needs a full rewrite.

Now what i tried to express was that it is probably a good idea to use one common source of truth for such data in general (because it is irritating having different vaues inside the GUI). And because i didn't knew that RRD was in need of a refactoring i thought that source of truth could be the RRD. But you can abstract that away if you like.

Similar to your idea

> We may have to write a small tool to fetch the temperatures in the background away from the GUI so when the GUI query comes in it reads the actual value, not the one while the CPU is busy processing the user request

The important thing being that there shouldn't be several ways to gather the data (tool and RRD) but only one way (tool) and the other consumer (widget, RRD) should ask the tool service for the values (to avoid different results and to avoid unnecessary load).

The second part about the means on how to read the actual values (sysctl dev.cpu) was meant to illustrate that it would be nice if one would use a more lightweight method for the central data crawler.
I compared that to "sysctl -a" in this context because i compared the RRD values to the output of the command line calls and "sysctl -a" was close to the RRD values in my case. Threrefore i asumed that it is using this command or at least something similar.


Greetings,
Stefan
Title: Re: 24.7 CPU Temps
Post by: beneix on August 09, 2024, 04:47:22 PM
With the updated widget in 24.7.1, is there a way (perhaps by editing the js file) to exclude the temperatures for CPU1, CPU2, etc.? In many cases, they will all be the same so it takes up screen real-estate to show several identical values.
Title: Re: 24.7 CPU Temps
Post by: franco on August 09, 2024, 09:46:16 PM
We will possibly add an option to average across all common sensors in the widget. It's the best of both words without trying to to it automatically.. which failed because temps from CPUs that report separate temperatures could still match when reading them from time to time making the data set jumpy in terms of how many sources it actually has.

As far as temps reading goes here is my take: if we say the GUI temp is wrong we have to assume the idle test temp is wrong as well. The real temp is somewhere in the middle, so the question is how many checks per second do we need to make to get the correct average under light load... because I think the temperatures are closer together under higher load anyway.



Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: dirtyfreebooter on August 12, 2024, 06:17:03 PM
so taking out the OPNsense UI and such i can't really explain this weirdness. in the same command on an idle system:

# sysctl hw.acpi.thermal.tz0.temperature dev.cpu.{0,1,2,3}.temperature && sysctl -e `sysctl -aN | grep temperature`
hw.acpi.thermal.tz0.temperature: 27.9C
dev.cpu.0.temperature: 31.0C
dev.cpu.1.temperature: 31.0C
dev.cpu.2.temperature: 31.0C
dev.cpu.3.temperature: 31.0C
hw.acpi.thermal.tz0.temperature=27.9C
dev.cpu.3.temperature=40.0C
dev.cpu.2.temperature=40.0C
dev.cpu.1.temperature=40.0C
dev.cpu.0.temperature=40.0C


if i look at the sysctl directly, its much lower temps, similar temps if i boot the same machine with debian. if i look at the temps how the UI gets them: /usr/local/opnsense/scripts/system/temperature.sh which does
sysctl -e `sysctl -aN | grep temperature`

for some reason those sysctls getting the same names return different values and its not some sort of thing like the commands themselves cause the CPU temps to rise...
Title: Re: 24.7 CPU Temps
Post by: franco on August 12, 2024, 08:48:46 PM
From what we have learned today this is the observation of heat not being able to get off the CPU quickly enough for whatever reason. It feels counter-productive to report a lower reading just because of the argument that the CPU reading is lower during idle. It is the temperature the CPU is at at the time of the reading.


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: dirtyfreebooter on August 12, 2024, 09:20:58 PM
trying this simple shell script:

#!/usr/local/bin/bash
sysctl dev.cpu.{0,1,2,3}.temperature hw.acpi.thermal.tz0.temperature
sysctl -e `sysctl -aN | grep temperature`


now run it super fast:

gnu-watch -n0.1 /root/temps.sh


Result:

Every 0.1s: /root/temps.sh

dev.cpu.0.temperature: 32.0C
dev.cpu.1.temperature: 32.0C
dev.cpu.2.temperature: 32.0C
dev.cpu.3.temperature: 32.0C
hw.acpi.thermal.tz0.temperature: 27.9C
hw.acpi.thermal.tz0.temperature=27.9C
dev.cpu.3.temperature=43.0C
dev.cpu.2.temperature=43.0C
dev.cpu.1.temperature=43.0C
dev.cpu.0.temperature=43.0C


there is clearly a difference between

sysctl dev.cpu.{0,1,2,3}.temperature hw.acpi.thermal.tz0.temperature

and

sysctl -e `sysctl -aN | grep temperature`


which seems like it maybe a bug in sysctl? unless the subprocess of

`sysctl -aN | grep temperature`

can cause the CPU to spike 10+ degees C, which seems unlikely. and if it did, why when running in at 0.1s intervals, why doesn't it effect the other sysctl?
Title: Re: 24.7 CPU Temps
Post by: franco on August 12, 2024, 09:28:30 PM
Why should the numbers lie? Idle vs. busy should yield a temperature difference, no? Assuming the reading is wrong seems futile... software bug? hardware bug? Not on our end then, we just read it. ;)


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: dirtyfreebooter on August 12, 2024, 09:32:17 PM
i certainly agree that the differences don't really matter, for sure. but it not because of idle or CPU activity, its seems like a bug. the behavior has existed in 24.1 and now in 24.7 and exists for me on intel and amd cpus, so it seems like a systctl bug and it seems very subtle, but also not critical
Title: Re: 24.7 CPU Temps
Post by: holocron on August 12, 2024, 09:56:44 PM
No issue with the temps themselves. I'm using an N100 and they are where I expect them to be.

However, there is a formatting issue in the dashboard widget.

(https://i.imgur.com/c6n2D13.png)
Title: Re: 24.7 CPU Temps
Post by: yahyoh on August 13, 2024, 08:48:28 PM
I still think the discrepancy in temps readings make no sense lol.

Isn't the temps supposed to be read directly from the sensor on the CPU?im not that BSD nerd. but i never faced such issue in windows or linux Ahh.

plus 10-15c differences sounds way too much of a difference,furthermore the external(which act as heat sink) body of my N5105 doesnt even feel that hot to indicted that CPU is really in the 50c.

https://i.imgur.com/W5UCYKo.mp4


root@OPNsense:~ # sysctl -a | grep temperature
hw.acpi.thermal.tz0.temperature: 47.1C
dev.cpu.3.temperature: 46.0C
dev.cpu.2.temperature: 46.0C
dev.cpu.1.temperature: 45.0C
dev.cpu.0.temperature: 45.0C
root@OPNsense:~ # sysctl -a | grep temperature
hw.acpi.thermal.tz0.temperature: 47.1C
dev.cpu.3.temperature: 48.0C
dev.cpu.2.temperature: 47.0C
dev.cpu.1.temperature: 46.0C
dev.cpu.0.temperature: 45.0C
root@OPNsense:~ # sysctl -a | grep temperature
hw.acpi.thermal.tz0.temperature: 47.1C
dev.cpu.3.temperature: 47.0C
dev.cpu.2.temperature: 46.0C
dev.cpu.1.temperature: 45.0C
dev.cpu.0.temperature: 47.0C
root@OPNsense:~ # sysctl -a | grep temperature
hw.acpi.thermal.tz0.temperature: 47.1C
dev.cpu.3.temperature: 49.0C
dev.cpu.2.temperature: 48.0C
dev.cpu.1.temperature: 47.0C
dev.cpu.0.temperature: 46.0C
root@OPNsense:~ #
root@OPNsense:~ # sysctl -a | grep temperature
hw.acpi.thermal.tz0.temperature: 47.1C
dev.cpu.3.temperature: 48.0C
dev.cpu.2.temperature: 46.0C
dev.cpu.1.temperature: 46.0C
dev.cpu.0.temperature: 46.0C

Title: Re: 24.7 CPU Temps
Post by: maxus on August 27, 2024, 11:41:27 AM
Hi @all,

I'm new here and facing the same problem: I'm using an N100 Mini-PC which I've updated last evening from 24.1 to 24.7. Although the update process seemed to work fine, GUI shows much higher (~10°C) temperatures than the CLI output:


markus@opnsense:~ % sysctl -a | grep temperature && sysctl dev.cpu | grep temperature
hw.acpi.thermal.tz0.temperature: 27.9C
dev.cpu.3.temperature: 61.0C
dev.cpu.2.temperature: 59.0C
dev.cpu.1.temperature: 58.0C
dev.cpu.0.temperature: 57.0C
dev.cpu.3.temperature: 58.0C
dev.cpu.2.temperature: 58.0C
dev.cpu.1.temperature: 57.0C
dev.cpu.0.temperature: 57.0C


(https://i.postimg.cc/rpM9kJyk/cpu-temp.png) (https://postimg.cc/DSpGQ1bj)

Also tried this solution, but nothing seems to have changed:
https://forum.opnsense.org/index.php?topic=42323.0

Title: Re: 24.7 CPU Temps
Post by: meyergru on August 27, 2024, 11:55:11 AM
@maxus and @yahyoh: Yes, we know all that. You probably should re-read the thread.

Franco already explained in detail what is going on:

The difference between the (current) GUI query and a query from the CLI is that the during the processing of the dashboard widgets (which include the temperature readouts), the CPU is being used, which in turn heats it up, resulting in an increased reading. You could probably reduce the difference by de-selecting all but the CPU temperature widget.

The granularity of modern CPUs temperature is so high that this matters, because the sensors now reside on the CPU die itself. The temperature can jump a few degrees in a few microseconds.

Franco also told you that this could only be fixed if the time of readout is shifted from the point in time that the GUI processes the widgets (so a background process is probably needed which decouples this).
Title: Re: 24.7 CPU Temps
Post by: maxus on August 27, 2024, 01:02:35 PM
Quote from: meyergru on August 27, 2024, 11:55:11 AM
@maxus and @yahyoh: Yes, we know all that. You probably should re-read the thread.

Franco already explained in detail what is going on:

The difference between the (current) GUI query and a query from the CLI is that the during the processing of the dashboard widgets (which include the temperature readouts), the CPU is being used, which in turn heats it up, resulting in an increased reading. You could probably reduce the difference by de-selecting all but the CPU temperature widget.

The granularity of modern CPUs temperature is so high that this matters, because the sensors now reside on the CPU die itself. The temperature can jump a few degrees in a few microseconds.

Franco also told you that this could only be fixed if the time of readout is shifted from the point in time that the GUI processes the widgets (so a background process is probably needed which decouples this).

Hi @meyergru,

Thank you for explaining it again :)
Sorry to ask, but were the temperature queries solved differently in the old GUI (24.1)?
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 27, 2024, 01:08:15 PM
No. There were exactly the same complaints about something being "too high" when compared to not staring at the dashboard. Like here (https://github.com/opnsense/core/issues/6911).

Title: Re: 24.7 CPU Temps
Post by: maxus on August 27, 2024, 09:19:29 PM
Quote from: doktornotor on August 27, 2024, 01:08:15 PM
No. There were exactly the same complaints about something being "too high" when compared to not staring at the dashboard. Like here (https://github.com/opnsense/core/issues/6911).

Hy @doktornotor,

thank your for the information: It's not the case that I stare at the dashboard all day long ;)
This is my CPU overview before and after the upgrade. You can see that the "User" and "System" processes were previously in the milli range and are now 1 and 2 digits respectively. Why? Is it possible that this is why the temperature has risen?

(https://i.postimg.cc/rpHkvMGx/temp-Image-OIh-Ax-B.avif) (https://postimg.cc/XZ9m5MzY)

I also ran several widgets (including the CPU temp graph) in the old GUI and never experienced such temperature spikes (even while staring at the dashboard).

I also removed all widgets except for the CPU temp display this afternoon. In the widget itself it looks like the temperature is no longer rising as high as before (~70-80°C), but nothing really changes in the RRD diagram.

I am certainly no expert, but the phenomenon only occurred after the update from 24.1 to 24.7. So I wonder what has changed? I just want to understand it.

Thank you.
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 27, 2024, 09:30:10 PM
Quote from: maxus on August 27, 2024, 09:19:29 PM
but the phenomenon only occurred after the update from 24.1 to 24.7.

Well that simply is not true, as documented by the ticket I linked. Whatever, it shows data returned by the on-die sensors and as read and provided by the kernel. I don't really know why people want to see incorrect readings just because they don't like the data shown. Anyway, this is the current dumpspace of these complaints. (https://github.com/opnsense/core/issues/7730).
Title: Re: 24.7 CPU Temps
Post by: meyergru on August 27, 2024, 09:55:54 PM
Maybe part of the difference is from the fact that widget evaluation has changed because of the structural changes (like order of evaluation or complexity of other widgets).

Maybe you have an RRD database update / maintenance running after the upgrade that caused CPU spikes.

Whatever the reason, there is a github issue and probably it will be adressed if no issues exist that have higher priority (of which I know some...).
Title: Re: 24.7 CPU Temps
Post by: maxus on August 27, 2024, 10:12:45 PM
Quote from: doktornotor on August 27, 2024, 09:30:10 PM
Well that simply is not true, as documented by the ticket I linked.
Whatever, it shows data returned by the on-die sensors and as read and provided by the kernel. I don't really know why people want to see incorrect readings just because they don't like the data shown. Anyway, this is the current dumpspace of these complaints. (https://github.com/opnsense/core/issues/7730).

And I wonder if my questions are perhaps really so misleading:
- Why the change in the processes (as in the picture in the last post)?
- Why the increase in temperatures after the update?
- The link you mentioned starts with October 2023. My discrepancies became visible with yesterday's update from 24.1 to 24.7.

If someone tells me: "Yes, we have changed something here and there (e.g. widget) so that the correct temperatures are now logged and more processes are also running", then that's totally ok, because that would be an answer.

I am quite sure that the problem you are talking about (Github link) has been occurring for some time. That doesn't mean that this is the case for me. I have already written that I have not changed anything in the widgets (e.g. number) or in OPNsense itself, nor have I changed my behavior (e.g. viewing the dashboard continuously for 24 hours), but only that I have done the system update. Something must have changed since the update, otherwise we wouldn't be discussing it here.

But maybe I just don't understand it...

Quote from: meyergru on August 27, 2024, 09:55:54 PM
Maybe part of the difference is from the fact that widget evaluation has changed because of the structural changes (like order of evaluation or complexity of other widgets).

Maybe you have an RRD database update / maintenance running after the upgrade that caused CPU spikes.

Whatever the reason, there is a github issue and probably it will be adressed if no issues exist that have higher priority (of which I know some...).

Thank you @meyergru for your answer.

Regarding the RRD database update / maintenance: How long could that really take? Temperature doesn't change in the RRD since the Update.
Do you think "Reset RRD Data" would change something (someone mentioned it before)?

D'accord that there are definitely bigger problems than this  ;D
My mini PC with N100 wasn't really "cold" even before and at first, when another 10 degrees are added on top (according to the RRD), you start to worry.
Title: Re: 24.7 CPU Temps
Post by: meyergru on August 27, 2024, 10:46:24 PM
RRD databases do reconstruct sometimes after a reboot. I have experienced CPU spikes and 100% load after reboots as well. Of course that will raise temps, so resetting/repairing RRD databases often helps.

Even with all things equal: There is a discrepancy between a CLI show oft temps vs. a GUI inspection, because there is a lot more going on in the GUI. And because of structural changes with widgets, there may be even differences between old and new widgets.

That is not a sign of some kind of defect, but probably we need a different approach here to be compatible with the old reporting.

Also, the 10 degrees more are real - but there is no need to worry, you just need to understand how to interpret the reported temperature as these higher temps are spikes only, which may have been there all the time - only wthout you noticing them.
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on August 28, 2024, 12:22:42 AM
maxus,

I'm having the exact same issue and observations, and I do not believe the higher temperatures are strictly the result of how they're being reported as explained by others. Instead, as you inferred, I think the temperature is the result of higher CPU utilization.

Under Reporting -> Health my 'user' and 'system' processor utilization is showing it run 25 times higher under 24.7.x than under 24.1. The number of existing processes is still the same though. Logically, this explains why I am also seeing temperatures that are 10 to 15 degrees higher and peaking into maximum range of what my processors specs.

I think this is worth investigating and patching before anyone has hardware failure as a result.
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 12:25:43 AM
Quote from: irrenarzt on August 28, 2024, 12:22:42 AM
I think this is worth investigating and patching before anyone has hardware failure as a result.

This is getting borderline absurd. Get a better cooling system if you have such concerns.
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on August 28, 2024, 12:31:11 AM
Quote from: doktornotor on August 28, 2024, 12:25:43 AM
Quote from: irrenarzt on August 28, 2024, 12:22:42 AM
I think this is worth investigating and patching before anyone has hardware failure as a result.

This is getting borderline absurd. Get a better cooling system if you have such concerns.

My cooling system wasn't a problem in 24.1.

My CPU utilization increasing for no apparent reason in 24.7 and producing more heat is though.

Why be so condescending about a problem that is so easily observable and graphed in the system history?
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 12:37:38 AM
Your cooling system is not a problem with 24.7 either. The only problem here is in people's heads. Take off the heatsink of your CPU. Nothing will happen. It will underclock itself to the point of being unusable. Eventually it will shut down. That's all. Nothing will burn. No flames. Nothing will be damaged.

Additionally, would suggest reading your CPU thermal specs before bringing claims such as CPUs are damaged when run at 60C.

Sheesh. Perhaps removing the widget would be the best course of action here.
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on August 28, 2024, 12:48:59 AM
You're ignoring the underlying problem of both mine and maxus' concern regarding the higher CPU utilization, and higher temps being a symptom of it.

If it were *just* a widget reporting higher temperatures, I wouldn't be so annoyed with your responses... But a major increase in utilization correlates with what we're seeing.

I'm also well aware of what my specs are as I'm seeing spikes of temps above 70C, which is the maximum for my processor. Right now it's averaging 60C.
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 12:52:17 AM
Yeah, I'm ignoring underlying non-existent problems such as CPU being used for computing and running at whopping 60C. This is in the same area such as complaints about memory being used e.g. for caching, instead of being wasted. That's another evergreen.

Not sure for which CPU there's 70C max but if you have such processor, then indeed your cooling is inadequate and needs improving to prevent unexpected shutdowns. Adequate cooling is such that it keeps CPU stable and within its operating specs at 100% load of all cores, for prolonged time. You can do such tests yourself even on OPNsense:


pkg install stress-ng
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on August 28, 2024, 01:01:36 AM
If you think an unexplained 25x increase in CPU utilization resulting from an update is a non-existent problem, then there is no hope in discussing this rationally.

It's very clearly a flaw and it explains why more heat is being generated.
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 01:05:58 AM
Perhaps start a new topic providing some real information, using tools such as top (https://man.freebsd.org/cgi/man.cgi?top(1)) about your 25x higher CPU usage (compared to unknown base).
Title: Re: 24.7 CPU Temps
Post by: chemlud on August 28, 2024, 08:58:58 AM
@irrenarzt (how come to choose such a name?!?)

Which services are you using? Suricata or alike maybe?
Title: Re: 24.7 CPU Temps
Post by: wogman on August 28, 2024, 09:52:44 AM
Quote from: irrenarzt on August 28, 2024, 01:01:36 AM
If you think an unexplained 25x increase in CPU utilization resulting from an update is a non-existent problem, then there is no hope in discussing this rationally.

It's very clearly a flaw and it explains why more heat is being generated.

I have an N100 and have seen no increase in CPU usage nor CPU temperature, it's always well under 60 so it's clearly not a flaw and doesn't explain anything.

As Chemlud said, what else is going on with your device?
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 10:23:04 AM
Quote from: wogman on August 28, 2024, 09:52:44 AM
I have an N100 and have seen no increase in CPU usage nor CPU temperature, it's always well under 60 so it's clearly not a flaw and doesn't explain anything.

As Chemlud said, what else is going on with your device?

The pattern here seems to be

a/ a new version is released on top of new major FreeBSD version, with tons of rewrites (such as the dashboard) People notice that things changed and scream "oh noes, it uses more CPU/RAM/beer in my fridge vanished and my cat got ill after upgrade! On the previous version it was just fine, CPU was barely being used. The new one sucks!" "640KB of memory ought to be enough for anybody!"

b/ users that have some runaway process running or some quirk in their setup (which may or may not be related to the upgrade at all) jump on the bandwagon and keep adding completely non-specific complaints about their fringe issue to those topics.

Wash, rinse, repeat...

No, 24.7 has NOT increased CPU usage 25 times... Start your own topic for debugging particular issue you are observing and provide some basic info.

No, CPUs don't burn at 60C. They've not been burning for years even without heatsink.

And as also noted above. Do not base your cooling on the assumption that your CPU will be unused. You can always hit a situation when some bad process will tax your CPU cores for tens of minutes / hours due to a bug, suboptimal code or by simply you configuring something your HW is not capable of handling reasonably. Things like IPS, netflow, or huge DNS blocklists getting parsed by the python code for hours come to mind.

Title: Re: 24.7 CPU Temps
Post by: chemlud on August 28, 2024, 10:45:57 AM
Quote from: doktornotor on August 28, 2024, 10:23:04 AM
...
The pattern here seems to be

a/ a new version is released on top of new major FreeBSD version, with tons of rewrites (such as the dashboard) People notice that things changed and scream "oh noes, it uses more CPU/RAM/beer in my fridge vanished and my cat got ill after upgrade! On the previous version it was just fine, CPU was barely being used. The new one sucks!" "640KB of memory ought to be enough for anybody!"

b/ users that have some runaway process running or some quirk in their setup (which may or may not be related to the upgrade at all) jump on the bandwagon and keep adding completely non-specific complaints about their fringe issue to those topics.

Wash, rinse, repeat...
....


...and once a week a thread with "out-of-state traffic" and some totally insane newbie fw rules. :-) Daily bussiness. Keep calm and post on. ;-)
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 10:49:44 AM
Quote from: chemlud on August 28, 2024, 10:45:57 AM
...and once a week a thread with "out-of-state traffic"

Recall debating this a year or so ago on Github. Put the flags there to be displayed directly with the logs, not hidden under the i. I guess I should revive the request.
Title: Re: 24.7 CPU Temps
Post by: chemlud on August 28, 2024, 10:58:21 AM
Point is: Beginners won't get it, no matter which text you provide. Has allready been adapted in the past. As long as they don't know/care for TCP flags and the basics of the protocoll: Same question will come up again and again.

Learning curve is steep in the beginning. But one soul saved from the plasic box router faction is worth posting sensible replies to beginner questions. Sticky threads might be used to be linked in "standardized" replies, maybe...
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 11:02:18 AM
Quote from: chemlud on August 28, 2024, 10:58:21 AM
Point is: Beginners won't get it, no matter which text you provide.

Indeed, but it makes things obvious without having to ask them to click somewhere. I find this annoying as well, having it visible helps to visually spot / filter out irrelevant noise.
Title: Re: 24.7 CPU Temps
Post by: Mo'Kai on August 28, 2024, 02:00:32 PM
Quote from: chemlud on August 28, 2024, 10:58:21 AM
... But one soul saved from the plasic box router faction is worth posting sensible replies to beginner questions.

LOL that's a good one !

I too do not see any increasing temp level on both of my units, Dell sd_wan Edge 860 or the tiny Thinkcentre m920q. Cpu temps hover around 46-50C degr as before on v24.1
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on August 28, 2024, 06:14:07 PM
I am not running Suricata. The only optional services I'm running are CrowdSec and Unbound, of which neither appear to be the crux of the issue as their utilization is the same as before. There are no runaway processes, nor indicators in my logs. My configuration between 24.1 and 24.7 is identical outside of what changed from the update.

This exact problem is also being noticed and shared outside of this particular forum. Here is another example from a user on Reddit that has a screenshot of the same trend myself and maxus have seen after the update:
https://www.reddit.com/r/opnsense/comments/1f13bkk/weird_cpu_utilization_since_2472/

Instead of irrationally deriding the problem as an issue with my forum post count, lets try acknowledging that multiple people are sharing the exact same data point of historical CPU utilization showing steep increases following the 24.7 update with no other configuration changes. In each case CPU utilization skyrockets immediately after the update (as obviously shown in the graphs) and maintains a significantly higher average than under 24.1.

I was seeing other similar posts from people prior to discovering I had the same problem. I didn't notice any obvious performance or operational impacts from normal use, and bought into the initial narrative that higher CPU temps were simply a result of how the new widget is reporting. However, after seeing repeated claims of higher CPU utilization with 24.7 I looked into my own health reports, and realized I have the same vertical cliff face of dramatically increased usage and load averages. Now that I'm seeing multiple people with the same problem, I'm not inclined to believe CPU temps are anything less than a symptom of a greater flaw.
Title: Re: 24.7 CPU Temps
Post by: chemlud on August 28, 2024, 06:29:57 PM
...let's also acknowledge that multiple users have no problem at all. So...
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on August 28, 2024, 06:37:21 PM
I am not ignoring that, as I'm well aware that hardware differences and such can impact differently just like with any other piece of software. Any legitimate testing takes into account different platforms.
Title: Re: 24.7 CPU Temps
Post by: franco on August 28, 2024, 09:21:47 PM
> hardwarecooling differences

;)

My proposal https://github.com/opnsense/core/commit/f473d9a5c7 got shot down because showing the "wrong" temperature once every 24 hours is apparently unacceptable. Maybe someone can tell me what is acceptable here.


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: Patrick M. Hausen on August 28, 2024, 09:26:39 PM
One question, though ... sorry if I missed something.

The computationally expensive task that jacks up CPU temp while reporting seems to be sysctl -a | grep ...

Why is this necessary? Is there really no API that gives you the number of cores so you could poll only the existing OIDs? I wonder how other systems that I have in my zoo report temperatures like e.g. ESXi or any appliance I poll via SNMP. It just feels "wrong" to implement it this way.

Kind regards,
Patrick
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 09:30:58 PM
Quote from: franco on August 28, 2024, 09:21:47 PM
My proposal https://github.com/opnsense/core/commit/f473d9a5c7 got shot down because showing the "wrong" temperature once every 24 hours is apparently unacceptable. Maybe someone can tell me what is acceptable here.

Well, FWIW, I already posted the ultimate solution...  8)

Quote from: doktornotor on August 28, 2024, 12:37:38 AM
Perhaps removing the widget would be the best course of action here.

Quote from: Patrick M. Hausen on August 28, 2024, 09:26:39 PM
Is there really no API that gives you the number of cores so you could poll only the existing OIDs?

If there was one, you'd still miss the other sensors. There are more sensors than CPU cores.
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on August 28, 2024, 10:03:05 PM
Or, hear me out, the CPU utilization issue seen here and elsewhere is the problem.
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 10:12:16 PM
Quote from: irrenarzt on August 28, 2024, 10:03:05 PM
Or, hear me out, the CPU utilization issue seen here and elsewhere is the problem.

GOTO https://forum.opnsense.org/index.php?topic=41759.msg210341#msg210341
Title: Re: 24.7 CPU Temps
Post by: Patrick M. Hausen on August 28, 2024, 10:12:55 PM
Quote from: doktornotor on August 28, 2024, 09:30:58 PM
If there was one, you'd still miss the other sensors. There are more sensors than CPU cores.
I am aware of that, yet would not necessarily expect a generic firewall appliance running on all kinds of hardware to display more than the CPU core avarage. Do that in a clean way and call it "feature complete"  ;D

Leave the rest to proper network management systems and SNMP.
Title: Re: 24.7 CPU Temps
Post by: doktornotor on August 28, 2024, 10:18:13 PM
Ok, so let's remove useful information (https://forum.opnsense.org/index.php?topic=42403.msg209820#msg209820) because some people don't like the output of sysctl. SMH.
Title: Re: 24.7 CPU Temps
Post by: Patrick M. Hausen on August 28, 2024, 10:26:51 PM
You have a valid point. Let's see what Franco and colleagues come up with. I guess you can do one sysctl -a at boot time, then poll only the sensors you found afterwards. They are not likely to change without a reboot involved one way or another.
Title: Re: 24.7 CPU Temps
Post by: franco on August 28, 2024, 11:22:54 PM
To be frank just try the patch:

# opnsense-patch f473d9a5c7 && service configd restart

As I said if we don't agree it's progress someone will need to come up with a better solution reading temperature readings from the sysctls which can only be probed during runtime. I cannot spend indefinite amounts of community time on pleasing the demand for lower temperatures.

The sysctl -a was there since the fork. We don't have to argue its downsides in every nuance.


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: sbellon on August 29, 2024, 10:12:36 AM
Not saying this is related, but I agree that I see a change in the graphs after having upgraded to 24.7.1 and then after 24.7.2 as well. For me it's not the CPU temperature because I'm running that on a Proxmox VE and don't have that available inside OPNsense, but I can see how the usage of the "States" clearly (!) changed with the upgrade from 24.1.10 to 24.7.1 and then again to 24.7.2 as you can see from my attached screenshot (upgrade to 24.7.1 was on 08.08.24 and upgrade to 24.7.2 was on 21.08.24 - both clearly visible in the graph without further explanation).

I am not saying this change is a problem nor worth investigating, I'm just saying that I can clearly see this change in behaviour and this may very well have effects on CPU usage and/or memory usage and perhaps as a result even CPU temperature.

Oh, and yes, configuration has NOT changed AT ALL over this period of time.
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on September 02, 2024, 12:06:57 AM
Followup:

I've found a workaround that reduces my CPU utilization and temps to pre-24.7 levels. The problem is:
/usr/local/bin/python3 /usr/local/opnsense/scripts/filter/update_tables.py

When I disable the Maxmind Geoblock aliases, the CPU temp drops by 10C and utilization from that process drops by 50%. If I reenable that alias, the temp and utilization jump back up.

I have not altered any of these aliases, and the table entries are consistent with what they were under 24.1. This leads me to believe there is still an underlying problem that needs to be identified (this just helps narrow it down).

Since this is a python script, and 24.7 brought us python 3.11 - Is it possible that python 3.11 is the underlying problem?
Title: Re: 24.7 CPU Temps
Post by: meyergru on September 02, 2024, 01:02:09 AM
I think that there must be another problem with your setup. That script is run only when the aliases change.

For me, it does not run all the time, so it just cannot be responsible for any ongoing CPU load, even if it were less efficient than before. I may cause some spikes for the time it runs if large amounts of aliases are processed.

So, are you saying that "ps auxwww | fgrep update_tables" is running all the time? I think it clearly should not and for me, it does not - despite the fact that I also use the Maxmind geoip database for blocking.
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on September 02, 2024, 01:06:17 AM
It's not running all the time, but approximately once a minute for a few seconds which is enough to effect temperature readings and the average CPU usage under health reporting.
Title: Re: 24.7 CPU Temps
Post by: meyergru on September 02, 2024, 01:12:29 AM
It clearly does not do that for me.

Perhaps you have too many aliases? Maybe because you did not follow the tips here (https://docs.opnsense.org/manual/aliases.html#geoip), e.g. because you do not have enough firewall states and the alias database never gets fully processed, starting the process over again and again?

Something has to trigger that update...
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on September 02, 2024, 01:30:14 AM
I only have 5 aliases, and I followed the guides when I did my initial install. There is enough available, there are no errors in my logs, and the problems didn't arise until after the 24.7 install - which is why I could see the noticeable difference in health reporting.

I searched before posting, and there is a lot of people reporting that specific process running about once a minute over the years without anyone contradicting it as abnormal... Disabling the Geoblock alias also does not change the frequency that the process runs, it only changes the percentage of CPU utilization to pre-24.7 levels.
Title: Re: 24.7 CPU Temps
Post by: meyergru on September 02, 2024, 11:31:01 AM
Sorry, just wanted to help, not fight. Good luck to find the problem.
Title: Re: 24.7 CPU Temps
Post by: meyergru on September 02, 2024, 12:07:45 PM
On a side note: in /usr/local/etc/filter_tables.conf, you will find <ttl> (time-to-live) tags. For me, all GeoIP rhythms seem to be 86400. If they are at 1 minute for you, that would explain it. You can look at the last change times in /var/db/aliastables to find out which alias is causing this.

Also, there is an undocumented tag <updatefreq> in config.xml, which is probably translated into <ttl>, if set. IDK how that was modified if it is not exposed in the GUI, but who knows?

All I can say is that calling update_tables.py manually takes 4 seconds here but modifies only a few aliases. I imagine that processing of geoip aliases takes a lot longer, but in my case, this is done once per day only.
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on September 02, 2024, 09:07:07 PM
I apologize if I sounded argumentative, was just trying to relay what I'm seeing. I genuinely only take issue with one person in this thread.

One note, I'm currently limited to what I can access via web GUI...

I checked my config.xml and <updatefreq> was only set for one 1 alias (not the Geoblock), of which I updated it so it's no longer set either... That had no impact, however.

Playing with Geoblock further, I have to get the number of current table entries below 100,000 (only 1/10th of the Geoblock list) for there to be any impact to temps and utilization.

I'm also curious about what exactly is happening when update_tables.py runs and the task it's performing, since it's not actually updating the aliases. If it were, I assume I'd see changes to the "Last updated" timestamp everytime it runs. However, the time stamp is only updating once a day based on Cron schedule.
Title: Re: 24.7 CPU Temps
Post by: MenschAergereDichNicht on September 04, 2024, 07:59:30 PM
For me the following somehow works:

Setting the tunable "dev.hwpstate_intel.*.epp" to 80 seems to avoid too much spiking of the CPU and results in a sane behaviour of the temperature widget.

I missed the fact that for newer CPUs the powerd daemon does not work.
As i don't need very much processing power for my setup there is no need for higher frequencies.

Hardware: J6412 Protectli VP2420
Title: Re: 24.7 CPU Temps
Post by: irrenarzt on October 25, 2024, 06:13:37 PM
Sorry to bring up an old thread, but this latest 24.7.7 update seems to have helped reduce the CPU utilization problem. I'm still not at pre-24.7 levels, but it's definitely a welcome steep drop I can see in my health reporting charts. Another user in the update thread on Reddit has posted the same positive observation.

Does anyone know what changed that improved this?
Title: Re: 24.7 CPU Temps
Post by: passeri on October 26, 2024, 12:08:56 AM
Holy Heisenberg, Batman.

Sorry, that related to the general problem of measuring temps affecting temps, and I am not sure how to delete it from this context.
Title: Re: 24.7 CPU Temps
Post by: N0ttyT3chy on November 01, 2024, 12:44:29 PM
Did not want to start a new thread on cpu temps if possible. Have read through this thread and am trying to understand as best as is possible. I'm not at all very well educated with opnsense/firewalls and such. After the update to 27.7.7 my protectli unit is 20c higher from high 40c to high 60c. I have had a usb fan (not drawing power from unit) since first install. I see the spikes, yes, but it never goes down to even 50's c. I also lost my zenarmor cloud connection and haven't been able to reconnect. Need an "opnsense for dummies" type of reply as so much written here is above my level of comprehension. Thanks for your understanding.
Title: Re: 24.7 CPU Temps
Post by: inorx on April 01, 2025, 01:23:18 AM
Deleted as construtive solution oriented contributions do not seem to be respected.
Title: Re: 24.7 CPU Temps
Post by: franco on April 01, 2025, 10:15:26 AM
Maybe it's time to stop obsessing over temperature differences in a known good CPU temperature range. Technology definitely does not care how you like your temperature to be reported anymore.  ;)


Cheers,
Franco
Title: Re: 24.7 CPU Temps
Post by: inorx on April 01, 2025, 09:45:12 PM
N150 running opnsense: UI reports core tempreture +15 degrees above cli.
N305 running proxmox: UI reports core tempreature with 0 degree difference to cli.

These are facts, not obessions. Just to put that right.

I have understood you're not interested in improving this and contribution is not welcome.
Title: Re: 24.7 CPU Temps
Post by: passeri on April 02, 2025, 12:05:26 AM
@inorx, temperature measurements may be momentary facts. The obsession is over the display.

If the GUI tells me my routers are running in the 35-70 range (actually 50-54 in my case) then why would I care what it "really" is? I know that number will be lower anyway. If the GUI consistently shows 70 or so while CPU and load factors are low then check the thermal connections, ambient air, ventilation. If load, CPU and thermals are all high then the issues are load and performance.

Temperature display is a crude measure. Rarely does anyone need it "accurately" and if you do, there is sysctl dev.cpu but something else will also need cross-checking.

I thought your approach to measurement was interesting, by the way, but in my experience it just does not really matter.
Title: Re: 24.7 CPU Temps
Post by: franco on April 02, 2025, 05:53:10 PM
> These are facts, not obessions. Just to put that right.

The important bit is asking to address the skew reported by the hardware in a software that has one single shared way of asking both hardware for their temperatures.  All we can do is obscure the fact that the hardware reports its temperature hotter than the other hardware and I don't want to obscure that any more than I've already done, because in reality I don't even know which of these two platforms needs to apply the skew and which doesn't.


Cheers,
Franco