Temperature: Dashboard Temps differ massively from CLI

Started by fastboot, December 01, 2024, 02:15:53 PM

Previous topic - Next topic
Hi folks,


I am running a Protectli 6630 with a "12th Gen Intel(R) Core(TM) i3-1215U (6 cores, 8 threads)".

When I reload the dashboard I get the following:
CPU 0 = 81°C
CPU 1 = 81°C
CPU 2 = 58°C
CPU 3 = 58°C
CPU 4 = 79°C
CPU 5 = 79°C
CPU 6 = 79°C
CPU 7 = 79°C
CPU 8 = 79°C


When just ideling on the Dashboard its:
CPU 0 = 58°C
CPU 1 = 58°C
CPU 2 = 54°C
CPU 3 = 54°C
CPU 4 = 58°C
CPU 5 = 58°C
CPU 6 = 58°C
CPU 7 = 58°C
CPU 8 = 58°C


In the CLI I issued two commands, which also differ.

sysctl dev.cpu | grep temperature | sort
dev.cpu.0.temperature: 40.0C
dev.cpu.1.temperature: 40.0C
dev.cpu.2.temperature: 35.0C
dev.cpu.3.temperature: 35.0C
dev.cpu.4.temperature: 40.0C
dev.cpu.5.temperature: 40.0C
dev.cpu.6.temperature: 40.0C
dev.cpu.7.temperature: 40.0C


sysctl -a | grep temperature | sort
dev.cpu.0.temperature: 49.0C
dev.cpu.1.temperature: 49.0C
dev.cpu.2.temperature: 41.0C
dev.cpu.3.temperature: 42.0C
dev.cpu.4.temperature: 50.0C
dev.cpu.5.temperature: 52.0C
dev.cpu.6.temperature: 52.0C
dev.cpu.7.temperature: 52.0C



1. Could you tell me please why the temperatures Dashboard and CLI differ that much?
2. I do not understand the difference between the CLI commands.
3. On which values can I rely?


Thanks a lot in advance.

Cheers,

fb

Use your google-fu and you will find a ton of threads about it.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

I think meyergru is tired of repeating his answers and he's not wrong.

You are using cpu cycles to run that command and one is a lot more intensive than the other which is why there is a difference in the temps.  That answers questions 1 and 2.

To answer your third, I would rely on the "sysctl dev.cpu | grep temperature | sort" more since its showing the temperature with the command skewing the results the least.
AhnHEL (Angel)

I did not want to sound rude, but sometimes I wished people would actually use the forum search or the tutorial section before asking questions. For obvious reasons, this particular topic has just made it into here, as point 17.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Oh, I don't mind.

Actually the answers of the other threads do not really answer my questions. But well...
Maybe because its related to my lack of knowledge of Unix distros. In fact I come from the Linux side. There is no issue at all with lm-sensors for instance. Doesnt matter how many times I issue it or even have it inside of what so ever dashboard. Literally I am reporting real time data to Grafana of xx Servers without any tangible or measurable effect.

So surely I am still wondering why this has such a huge effect. For me its around 30-40°C for a simple temperature view.
So in fact the data is useless then to keep an eye on the temperatures. Sad but true.

At least the Link: https://forum.opnsense.org/index.php?topic=41759.0 enlightened me a bit. But not fully tbh.

And nice, that I made it in your top #17. Lemme try to go at least to the top three :p

rickyricky does a good job of explaining it.

https://forum.opnsense.org/index.php?topic=36234.msg220563#msg220563

Quote from: rickyricky on November 24, 2024, 10:48:46 PM
you're querying the same values, but one method parses MUCH less data than the other...


Here is the dev.cpu method...
```
root@router-02:~ # sysctl dev.cpu | wc -l
     273
```

Looking for temp by looking at only needs to export and grep through 273 lines.



Here is the sysctl -a method...
```
root@router-02:~ # sysctl -a | wc -l
   16497
```

The sysctl has to export 16000+ more lines than the one that only looks at cpu values, then has to grep through those 16k values to find the ones that match the grep.

The 2nd command finishes quickly, but it still causes enough additional cpu load to show the temp has been raised by the time temperature is filtered out by the grep command.
AhnHEL (Angel)

Worrying about CPU temperature can be a little obsessive, and a little off the point.

If you are aware of the long run capacity of your system to dissipate heat, indicated by the average difference between stable CPU and ambient temperatures, then the focus should be on the ambient temperature. Take care of that and for all that its reading may bounce around, the CPU will look after itself.
Deciso DEC697
+crowdsec +wireguard

Some scenarios in which I think this might be not benign:

1) Newcomers to OPNsense, especially those with newly purchased devices, who don't have a prior idea of what a normal baseline temperature is.  They come away with a false impression.

2) Small fanless devices (Intel N-series, for example) with limited thermal margins may be pushed closer to throttle limits.

3) Hardware vendor support channels receiving reports about temperature hikes on devices running recent OPNsense.

Having said that, I can see the great discussions and progress on the GitHub issue links.  Many thanks  :)

> 1) Newcomers to OPNsense, especially those with newly purchased devices, who don't have a prior idea of what a normal baseline temperature is.  They come away with a false impression.

Ok but this is directly tied to how some modern hardware seems to be built. This hasn't been an issue for a decade. My biggest gripe is changing the OS to accommodate how *some* hardware dissipates heat is a hilarious premise in order to match user expectations. But the sensor is really this hot so what reality do we want to live in?

> 2) Small fanless devices (Intel N-series, for example) with limited thermal margins may be pushed closer to throttle limits.

I think this is exactly my point. The N series in particular has been the source of a number of funky issues due to design and build choices. Fixing this in software has become the trend.

> 3) Hardware vendor support channels receiving reports about temperature hikes on devices running recent OPNsense.

But they would on any hardware given the circumstances? After all it *is* the correct temperature? Are we disputing this again?

We're changing this for 25.1 although some people have complained at the approach. This really is not to fix in any sensible way to mask a real world hardware quirk.


Cheers,
Franco

Why not change the widget title to "CPU temperature" and use dev.cpu exclusively?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on December 04, 2024, 09:44:59 AM
Why not change the widget title to "CPU temperature" and use dev.cpu exclusively?

Very good point! I was just thinking to create a widget for me like that. Because at the end of the day I want to rely on data. If the data is not correct, I don't need it.

The other point about new hardware. Yes, my hardware is brand new. But actually I am not really a newcomer. It's just the hardware. I even discussed the temperatures with the vendor, like if they are normal, or if this is an issue with OPNsense itself. 

Meanwhile I did some Tests with a Linux Distro, lm-sensors and stress-ng. The behavior is absolutely normal. I reach the ~80°C when I max out the cores for 30+ minutes. In idle it's in the range 30-35°C. I do this with any new hardware I build. Well, this one I did not build myself, but you get my point I suppose. When I stress test the CPU, also both FANs speed up and cool the system.

I really do appreciate the effort of the OPNsense team to create this lovely piece of software. I really do. But also do I appreciate accuracy and data I can rely on.

Quote from: fastboot on December 04, 2024, 10:33:08 AM
Because at the end of the day I want to rely on data. If the data is not correct, I don't need it.

[...]

I really do appreciate the effort of the OPNsense team to create this lovely piece of software. I really do. But also do I appreciate accuracy and data I can rely on.

But the data shown is correct. In the moment the dashboard is rendered the CPU temperature is higher, because of the processing taking place to display the dashboard.

Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

> Why not change the widget title to "CPU temperature" and use dev.cpu exclusively?

These are the ones with the "bad" reading? :) You may be thinking of solving an adjacent issue which also plays into the fact the kernel fakes amdtemp-per-CPU temperatures for no apparent reason.


Cheers,
Franco

Quote from: franco on December 04, 2024, 10:53:35 AM
> Why not change the widget title to "CPU temperature" and use dev.cpu exclusively?

These are the ones with the "bad" reading? :) You may be thinking of solving an adjacent issue which also plays into the fact the kernel fakes amdtemp-per-CPU temperatures for no apparent reason.

I suggested this because apparently the processing effort of `sysctl -a | grep` is what increases the temperature in an attempt to catch every sensor that might be present while most people will be satisfied to monitor CPU temperature alone.

Interestingly I notice almost nothing of this effect on my hardware:

root@opnsense:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 44.1C
root@opnsense:~ # sysctl -a | grep dev.cpu.0.temperature
dev.cpu.0.temperature: 44.6C
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

# sysctl dev.cpu | grep temperature

This may be an option losing track of all other sensors, but only if it doesn't heat everything up for the people experiencing this? ;)