OPNsense Forum

English Forums => 24.7, 24.10 Legacy Series => Topic started by: fastboot on December 01, 2024, 02:15:53 PM

Title: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 01, 2024, 02:15:53 PM
Hi folks,


I am running a Protectli 6630 with a "12th Gen Intel(R) Core(TM) i3-1215U (6 cores, 8 threads)".

When I reload the dashboard I get the following:
CPU 0 = 81°C
CPU 1 = 81°C
CPU 2 = 58°C
CPU 3 = 58°C
CPU 4 = 79°C
CPU 5 = 79°C
CPU 6 = 79°C
CPU 7 = 79°C
CPU 8 = 79°C


When just ideling on the Dashboard its:
CPU 0 = 58°C
CPU 1 = 58°C
CPU 2 = 54°C
CPU 3 = 54°C
CPU 4 = 58°C
CPU 5 = 58°C
CPU 6 = 58°C
CPU 7 = 58°C
CPU 8 = 58°C


In the CLI I issued two commands, which also differ.

sysctl dev.cpu | grep temperature | sort
dev.cpu.0.temperature: 40.0C
dev.cpu.1.temperature: 40.0C
dev.cpu.2.temperature: 35.0C
dev.cpu.3.temperature: 35.0C
dev.cpu.4.temperature: 40.0C
dev.cpu.5.temperature: 40.0C
dev.cpu.6.temperature: 40.0C
dev.cpu.7.temperature: 40.0C


sysctl -a | grep temperature | sort
dev.cpu.0.temperature: 49.0C
dev.cpu.1.temperature: 49.0C
dev.cpu.2.temperature: 41.0C
dev.cpu.3.temperature: 42.0C
dev.cpu.4.temperature: 50.0C
dev.cpu.5.temperature: 52.0C
dev.cpu.6.temperature: 52.0C
dev.cpu.7.temperature: 52.0C



1. Could you tell me please why the temperatures Dashboard and CLI differ that much?
2. I do not understand the difference between the CLI commands.
3. On which values can I rely?


Thanks a lot in advance.

Cheers,

fb
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 01, 2024, 04:31:29 PM
Use your google-fu and you will find a ton of threads about it.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: AhnHEL on December 01, 2024, 06:57:35 PM
I think meyergru is tired of repeating his answers and he's not wrong.

You are using cpu cycles to run that command and one is a lot more intensive than the other which is why there is a difference in the temps.  That answers questions 1 and 2.

To answer your third, I would rely on the "sysctl dev.cpu | grep temperature | sort" more since its showing the temperature with the command skewing the results the least.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 01, 2024, 07:42:42 PM
I did not want to sound rude, but sometimes I wished people would actually use the forum search or the tutorial section before asking questions. For obvious reasons, this particular topic has just made it into here, as point 17 (https://forum.opnsense.org/index.php?topic=42985).
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 03, 2024, 10:09:30 PM
Oh, I don't mind.

Actually the answers of the other threads do not really answer my questions. But well...
Maybe because its related to my lack of knowledge of Unix distros. In fact I come from the Linux side. There is no issue at all with lm-sensors for instance. Doesnt matter how many times I issue it or even have it inside of what so ever dashboard. Literally I am reporting real time data to Grafana of xx Servers without any tangible or measurable effect.

So surely I am still wondering why this has such a huge effect. For me its around 30-40°C for a simple temperature view.
So in fact the data is useless then to keep an eye on the temperatures. Sad but true.

At least the Link: https://forum.opnsense.org/index.php?topic=41759.0 enlightened me a bit. But not fully tbh.

And nice, that I made it in your top #17. Lemme try to go at least to the top three :p
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: AhnHEL on December 04, 2024, 02:22:56 AM
rickyricky does a good job of explaining it.

https://forum.opnsense.org/index.php?topic=36234.msg220563#msg220563

Quote from: rickyricky on November 24, 2024, 10:48:46 PM
you're querying the same values, but one method parses MUCH less data than the other...


Here is the dev.cpu method...
```
root@router-02:~ # sysctl dev.cpu | wc -l
     273
```

Looking for temp by looking at only needs to export and grep through 273 lines.



Here is the sysctl -a method...
```
root@router-02:~ # sysctl -a | wc -l
   16497
```

The sysctl has to export 16000+ more lines than the one that only looks at cpu values, then has to grep through those 16k values to find the ones that match the grep.

The 2nd command finishes quickly, but it still causes enough additional cpu load to show the temp has been raised by the time temperature is filtered out by the grep command.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: passeri on December 04, 2024, 02:40:12 AM
Worrying about CPU temperature can be a little obsessive, and a little off the point.

If you are aware of the long run capacity of your system to dissipate heat, indicated by the average difference between stable CPU and ambient temperatures, then the focus should be on the ambient temperature. Take care of that and for all that its reading may bounce around, the CPU will look after itself.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 04, 2024, 05:36:23 AM
Some scenarios in which I think this might be not benign:

1) Newcomers to OPNsense, especially those with newly purchased devices, who don't have a prior idea of what a normal baseline temperature is.  They come away with a false impression.

2) Small fanless devices (Intel N-series, for example) with limited thermal margins may be pushed closer to throttle limits.

3) Hardware vendor support channels receiving reports about temperature hikes on devices running recent OPNsense.

Having said that, I can see the great discussions and progress on the GitHub issue links.  Many thanks  :)
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 09:36:16 AM
> 1) Newcomers to OPNsense, especially those with newly purchased devices, who don't have a prior idea of what a normal baseline temperature is.  They come away with a false impression.

Ok but this is directly tied to how some modern hardware seems to be built. This hasn't been an issue for a decade. My biggest gripe is changing the OS to accommodate how *some* hardware dissipates heat is a hilarious premise in order to match user expectations. But the sensor is really this hot so what reality do we want to live in?

> 2) Small fanless devices (Intel N-series, for example) with limited thermal margins may be pushed closer to throttle limits.

I think this is exactly my point. The N series in particular has been the source of a number of funky issues due to design and build choices. Fixing this in software has become the trend.

> 3) Hardware vendor support channels receiving reports about temperature hikes on devices running recent OPNsense.

But they would on any hardware given the circumstances? After all it *is* the correct temperature? Are we disputing this again?

We're changing this for 25.1 although some people have complained at the approach. This really is not to fix in any sensible way to mask a real world hardware quirk.


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: Patrick M. Hausen on December 04, 2024, 09:44:59 AM
Why not change the widget title to "CPU temperature" and use dev.cpu exclusively?
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 04, 2024, 10:33:08 AM
Quote from: Patrick M. Hausen on December 04, 2024, 09:44:59 AM
Why not change the widget title to "CPU temperature" and use dev.cpu exclusively?

Very good point! I was just thinking to create a widget for me like that. Because at the end of the day I want to rely on data. If the data is not correct, I don't need it.

The other point about new hardware. Yes, my hardware is brand new. But actually I am not really a newcomer. It's just the hardware. I even discussed the temperatures with the vendor, like if they are normal, or if this is an issue with OPNsense itself. 

Meanwhile I did some Tests with a Linux Distro, lm-sensors and stress-ng. The behavior is absolutely normal. I reach the ~80°C when I max out the cores for 30+ minutes. In idle it's in the range 30-35°C. I do this with any new hardware I build. Well, this one I did not build myself, but you get my point I suppose. When I stress test the CPU, also both FANs speed up and cool the system.

I really do appreciate the effort of the OPNsense team to create this lovely piece of software. I really do. But also do I appreciate accuracy and data I can rely on.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: Patrick M. Hausen on December 04, 2024, 10:52:30 AM
Quote from: fastboot on December 04, 2024, 10:33:08 AM
Because at the end of the day I want to rely on data. If the data is not correct, I don't need it.

[...]

I really do appreciate the effort of the OPNsense team to create this lovely piece of software. I really do. But also do I appreciate accuracy and data I can rely on.

But the data shown is correct. In the moment the dashboard is rendered the CPU temperature is higher, because of the processing taking place to display the dashboard.

Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 10:53:35 AM
> Why not change the widget title to "CPU temperature" and use dev.cpu exclusively?

These are the ones with the "bad" reading? :) You may be thinking of solving an adjacent issue which also plays into the fact the kernel fakes amdtemp-per-CPU temperatures for no apparent reason.


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: Patrick M. Hausen on December 04, 2024, 10:58:14 AM
Quote from: franco on December 04, 2024, 10:53:35 AM
> Why not change the widget title to "CPU temperature" and use dev.cpu exclusively?

These are the ones with the "bad" reading? :) You may be thinking of solving an adjacent issue which also plays into the fact the kernel fakes amdtemp-per-CPU temperatures for no apparent reason.

I suggested this because apparently the processing effort of `sysctl -a | grep` is what increases the temperature in an attempt to catch every sensor that might be present while most people will be satisfied to monitor CPU temperature alone.

Interestingly I notice almost nothing of this effect on my hardware:

root@opnsense:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 44.1C
root@opnsense:~ # sysctl -a | grep dev.cpu.0.temperature
dev.cpu.0.temperature: 44.6C
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 11:11:20 AM
# sysctl dev.cpu | grep temperature

This may be an option losing track of all other sensors, but only if it doesn't heat everything up for the people experiencing this? ;)
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 04, 2024, 02:06:54 PM
Yes, that is why I suggested that way of doing it. First an fgrep does not do regex matching, which is faster and second, by limiting the sysctl output to just a few lines instead of ~15000, the effort is limited.

I thought my comments to the suggested patch made this clear?

However, one could also use a textbox to leave the pattern choice for sysctl to the user with a default of dev.cpu.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 02:34:39 PM
To be fair we did not talk dailing back the scope for the widget, because that is the key question here as it also requires chasing the author of the plugin on the subject (which I have now and he agreed that is an option). The fgrep doesn't matter in practice if the sysctl tree is mostly ignored by specifying dev.cpu.


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 04, 2024, 02:45:22 PM
Actually, before you pointed me to that, I did not realise there were potentially more temp sensors outside of the dev.cpu and hw.acpi sysctl subtrees, nor did I care for any other temperatures...

Wait - does that make me a climate denier? ;-)
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 04, 2024, 03:27:18 PM
I also had no idea that there were other sensors besides CPU that could appear in the widget.  It seems impressions of widget function are guided in part by the type of hardware one has, rather than by widget documentation.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 03:32:52 PM
Quote from: franco on December 04, 2024, 11:11:20 AM
# sysctl dev.cpu | grep temperature

So, good people, to inch closer to resolution does this fix your temperature worries?


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 04:27:44 PM
Or let's go with

# configctl system sysctl values dev.cpu

for portability's sake.

So? :)
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 04, 2024, 05:42:43 PM
Quote from: Patrick M. Hausen on December 04, 2024, 10:52:30 AM
Quote from: fastboot on December 04, 2024, 10:33:08 AM
Because at the end of the day I want to rely on data. If the data is not correct, I don't need it.

[...]

I really do appreciate the effort of the OPNsense team to create this lovely piece of software. I really do. But also do I appreciate accuracy and data I can rely on.

But the data shown is correct. In the moment the dashboard is rendered the CPU temperature is higher, because of the processing taking place to display the dashboard.

No, the data is not correct. As I logged into a file. There I never reached the 80°C. Also the  CPU is an I3. Guess reading the temperature should now have such an effect on it.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: Patrick M. Hausen on December 04, 2024, 05:49:17 PM
Quote from: fastboot on December 04, 2024, 05:42:43 PM
No, the data is not correct. As I logged into a file. There I never reached the 80°C. Also the  CPU is an I3. Guess reading the temperature should now have such an effect on it.

You logged into a file with the command `sysctl -a | grep temperature`?
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 04, 2024, 06:02:26 PM
Quote from: Patrick M. Hausen on December 04, 2024, 05:49:17 PM

You logged into a file with the command `sysctl -a | grep temperature`?

No, not that command. I did not test it, but I would assume that even this would't make a huge difference.


cat test.sh
#!/bin/sh
LOGFILE="/var/log/cpu_temp.log"

while true; do
    TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S.%3N")
    TEMP_DATA=$(sysctl dev.cpu | grep temperature | sort)
    echo "$TIMESTAMP - $TEMP_DATA" >> "$LOGFILE"
    sleep 0.5
done


Not fast enough to reach any evidence that the data is just wrong?
And by the way, even this script stresses the CPU and HDD ;) But not really a tangible or measurable effect. Maybe I am blind?

Like mentioned, my Unix "fu" is limited...
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 06:03:32 PM
> No, the data is not correct.

I'm not trying to be difficult. It is the reading from the hardware. It is the temperature because the heat sink has had no time to dissipate the heat because it seems to be slower than in the average hardware we see. And if it's not correct, who is faking the temperature? I'm interested to know.  :)


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 04, 2024, 06:08:04 PM
Like just mentioned. My Unix skills are very limited.

From reading the other threads, the way to measure is "sysctl dev.cpu | grep temperature | sort" but correct me if I am wrong.

As I never reach the 80-82°C with this reading, I assume that the other reading must be wrong. I only reach this specific temperatures when I stress the CPU to 100% and even this does not happen in just 1 second.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: Patrick M. Hausen on December 04, 2024, 06:09:39 PM
[quote author=fastboot link=topic=44373.msg221848#msg221848 date=1733331746]
[quote author=Patrick M. Hausen link=topic=44373.msg221843#msg221843 date=1733330957]

You logged into a file with the command `sysctl -a | grep temperature`?
[/quote]

No, not that command. I did not test it, but I would assume that even this would't make a huge difference.


Trust me, it will. This command reads 16.000 OIDs from the kernel and this is what heats up the CPU.

And while it is perfectly fine to argue that this is not a clever way to read the CPU temperature because a side effect of the reading itself raises it significantly, the temperature read is correct at that very moment.

That's the entire point. Finding better ways to read temperatures while not missing sensors some users might consider essential.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 04, 2024, 06:15:39 PM
Quote from: Patrick M. Hausen on December 04, 2024, 06:09:39 PM
[quote author=fastboot link=topic=44373.msg221848#msg221848 date=1733331746]
[quote author=Patrick M. Hausen link=topic=44373.msg221843#msg221843 date=1733330957]

You logged into a file with the command `sysctl -a | grep temperature`?
[/quote]

No, not that command. I did not test it, but I would assume that even this would't make a huge difference.


Trust me, it will. This command reads 16.000 OIDs from the kernel and this is what heats up the CPU.

And while it is perfectly fine to argue that this is not a clever way to read the CPU temperature because a side effect of the reading itself raises it significantly, the temperature read is correct at that very moment.

That's the entire point. Finding better ways to read temperatures while not missing sensors some users might consider essential.

Well... give the users the choice what they want to see.

When I build dashboards in Grafana, Home Assistant, .... I choose only what I want to see.
But not sure if this is that easy to implement, as I am far away of being a dev myself.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 06:19:35 PM
To stress Patrick's point is we are merely arguing for the better half of the year why a decade old code is now showing "wrong" temperatures. If the consensus is to only show the dev.cpu temperature subtree that's fine but eventually **someone** with the issue needs to run the freaking command to confirm... I'm not posting a thirdfourth time about it.


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: Patrick M. Hausen on December 04, 2024, 06:38:10 PM
Unfortunately I do not have any hardware with that modern a CPU and advanced power saving.

On the Netgate device in Frankfurt I get 43 C for each of:
sysctl dev.cpu.0.temperature
sysctl dev.cpu | fgrep dev.cpu.0.temperature
sysctl -a | fgrep dev.cpu.0.temperature
configctl system sysctl values dev.cpu


Similar on the Deciso device here in Karlsruhe - 48.5, 48.6 - rises to 49.0 with the `sysctl -a` method.

We probably need someone with an N100 to actually just run these different commands and post the output.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 04, 2024, 06:51:15 PM
To put it into "user-understandable" words:

The reading of the temperature, when done via "sysctl dev.cpu | grep temperature" puts less stress on "some" CPUs than doing "sysctl -a | fgrep temperature" (which is currently done by the dashboard).

That this causes a higher temperature readout is somewhat new and still does not occur with some CPUs. Why? Because if the heat-dissipation solution is good (tm), then the heat will quickly be taken off the die and into the case or air or whatever. With an air-gap or bad thermal paste, heat builds up on the die and can only slowly dissipate.

However, since modern CPUs tend to use more of the available range of potential die temperatures and with the advent of badly designed (and sometimes built) china boxes, the dissipation may be much worse, such that even short bursts of CPU load (like those induced by "sysctl -a | fgrep temperature") heat up the CPU in "no time".
The situation is not improved when manufacturers turn up the power limits of these 6 Watt CPUs to 25 Watts or more per default.

This gets mostly unnoticed, because thermal throttling will keep the CPU from melting anyway. Also note, that the temperature that is being read is indeed "correct" - there is no denying that. It is merely shown somewhat higher than without reading it from ~15000 oids. Call it a heisenbug.

So, this explains why the problem is "new", despite the age-old code doing this. It also explains why Patrick and me do not see much difference between the readout with "sysctl dev.cpu | grep temperature" and "sysctl -a | fgrep temperature".

The remaining question is: How much of a difference is there for those of you who complain about "wrong readings"? BTW: Do not put the commands in a loop and leave at least 1 second between calls - for  reasons that now should be obvious.

Let's start: For me on an N100 with decent cooling and new thermal paste, the difference is ~3°C:


root@OPNsense:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 50.0C
root@OPNsense:~ # sysctl dev.cpu | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 50.0C
root@OPNsense:~ # sysctl -a | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 52.0C
root@OPNsense:~ # configctl system sysctl values dev.cpu
{"dev.cpu.3.temperature":"53.0C","dev.cpu.3.coretemp.throttle_log":"0","dev.cpu.3.coretemp.tjmax":"105.0C","dev.cpu.3.coretemp.resolution":"1","dev.cpu.3.coretemp.delta":"52","dev.cpu.3.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.3.cx_usage_counters":"1279770 3676798 125303","dev.cpu.3.cx_usage":"25.18% 72.35% 2.46% last 102us","dev.cpu.3.cx_lowest":"C3","dev.cpu.3.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.3.freq_levels":"806\/-1","dev.cpu.3.freq":"2025","dev.cpu.3.%parent":"acpi0","dev.cpu.3.%pnpinfo":"_HID=ACPI0007 _UID=3 _CID=none","dev.cpu.3.%location":"handle=\\_SB_.PR03","dev.cpu.3.%driver":"cpu","dev.cpu.3.%desc":"ACPI CPU","dev.cpu.2.temperature":"51.0C","dev.cpu.2.coretemp.throttle_log":"0","dev.cpu.2.coretemp.tjmax":"105.0C","dev.cpu.2.coretemp.resolution":"1","dev.cpu.2.coretemp.delta":"54","dev.cpu.2.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.2.cx_usage_counters":"1710877 4995190 134535","dev.cpu.2.cx_usage":"25.01% 73.02% 1.96% last 148us","dev.cpu.2.cx_lowest":"C3","dev.cpu.2.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.2.freq_levels":"806\/-1","dev.cpu.2.freq":"2100","dev.cpu.2.%parent":"acpi0","dev.cpu.2.%pnpinfo":"_HID=ACPI0007 _UID=2 _CID=none","dev.cpu.2.%location":"handle=\\_SB_.PR02","dev.cpu.2.%driver":"cpu","dev.cpu.2.%desc":"ACPI CPU","dev.cpu.1.temperature":"52.0C","dev.cpu.1.coretemp.throttle_log":"0","dev.cpu.1.coretemp.tjmax":"105.0C","dev.cpu.1.coretemp.resolution":"1","dev.cpu.1.coretemp.delta":"53","dev.cpu.1.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.1.cx_usage_counters":"1170285 4249538 131505","dev.cpu.1.cx_usage":"21.08% 76.54% 2.36% last 255us","dev.cpu.1.cx_lowest":"C3","dev.cpu.1.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.1.freq_levels":"806\/-1","dev.cpu.1.freq":"2117","dev.cpu.1.%parent":"acpi0","dev.cpu.1.%pnpinfo":"_HID=ACPI0007 _UID=1 _CID=none","dev.cpu.1.%location":"handle=\\_SB_.PR01","dev.cpu.1.%driver":"cpu","dev.cpu.1.%desc":"ACPI CPU","dev.cpu.0.temperature":"52.0C","dev.cpu.0.coretemp.throttle_log":"0","dev.cpu.0.coretemp.tjmax":"105.0C","dev.cpu.0.coretemp.resolution":"1","dev.cpu.0.coretemp.delta":"53","dev.cpu.0.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.0.cx_usage_counters":"3142633 4968723 737","dev.cpu.0.cx_usage":"38.74% 61.25% 0.00% last 186us","dev.cpu.0.cx_lowest":"C3","dev.cpu.0.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.0.freq_levels":"806\/-1","dev.cpu.0.freq":"2001","dev.cpu.0.%parent":"acpi0","dev.cpu.0.%pnpinfo":"_HID=ACPI0007 _UID=0 _CID=none","dev.cpu.0.%location":"handle=\\_SB_.PR00","dev.cpu.0.%driver":"cpu","dev.cpu.0.%desc":"ACPI CPU","dev.cpu.%parent":""}


The delta is about the same on an N5105 and a J4125. If your delta is much higher, you should probably fix your thermal paste (https://www.congenio.de/infos/opnsense-hardware.html) or lower your BIOS power limits.

P.S.: I have no idea what stress the configctl method would put on the CPU...

P.P.S: On another note: After an update has been done, there sometimes are house-keeping or even runaway tasks that raise temps, maybe that is another factor why people keep telling that the new dashboard shows "wrong" (i.e. higher) temperatures than before the update.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: staticznld on December 04, 2024, 06:52:10 PM
Hunsn N100 4x i226-v



root@router:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 26.0C
root@router:~ # sysctl dev.cpu | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 27.0C
root@router:~ # sysctl -a | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 30.0C
root@router:~ # configctl system sysctl values dev.cpu
{"dev.cpu.3.temperature":"32.0C","dev.cpu.3.coretemp.throttle_log":"0","dev.cpu.3.coretemp.tjmax":"105.0C","dev.cpu.3.coretemp.resolution":"1","dev.cpu.3.coretemp.delta":"73","dev.cpu.3.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.3.cx_usage_counters":"9038 206303 33787","dev.cpu.3.cx_usage":"3.62% 82.81% 13.56% last 1671us","dev.cpu.3.cx_lowest":"C3","dev.cpu.3.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.3.freq_levels":"806\/-1","dev.cpu.3.freq":"3407","dev.cpu.3.%parent":"acpi0","dev.cpu.3.%pnpinfo":"_HID=ACPI0007 _UID=3 _CID=none","dev.cpu.3.%location":"handle=\\_SB_.PR03","dev.cpu.3.%driver":"cpu","dev.cpu.3.%desc":"ACPI CPU","dev.cpu.2.temperature":"31.0C","dev.cpu.2.coretemp.throttle_log":"0","dev.cpu.2.coretemp.tjmax":"105.0C","dev.cpu.2.coretemp.resolution":"1","dev.cpu.2.coretemp.delta":"74","dev.cpu.2.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.2.cx_usage_counters":"8177 209330 33176","dev.cpu.2.cx_usage":"3.26% 83.50% 13.23% last 804us","dev.cpu.2.cx_lowest":"C3","dev.cpu.2.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.2.freq_levels":"806\/-1","dev.cpu.2.freq":"3407","dev.cpu.2.%parent":"acpi0","dev.cpu.2.%pnpinfo":"_HID=ACPI0007 _UID=2 _CID=none","dev.cpu.2.%location":"handle=\\_SB_.PR02","dev.cpu.2.%driver":"cpu","dev.cpu.2.%desc":"ACPI CPU","dev.cpu.1.temperature":"31.0C","dev.cpu.1.coretemp.throttle_log":"0","dev.cpu.1.coretemp.tjmax":"105.0C","dev.cpu.1.coretemp.resolution":"1","dev.cpu.1.coretemp.delta":"74","dev.cpu.1.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.1.cx_usage_counters":"95908 1201303 634","dev.cpu.1.cx_usage":"7.38% 92.56% 0.04% last 76us","dev.cpu.1.cx_lowest":"C3","dev.cpu.1.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.1.freq_levels":"806\/-1","dev.cpu.1.freq":"3426","dev.cpu.1.%parent":"acpi0","dev.cpu.1.%pnpinfo":"_HID=ACPI0007 _UID=1 _CID=none","dev.cpu.1.%location":"handle=\\_SB_.PR01","dev.cpu.1.%driver":"cpu","dev.cpu.1.%desc":"ACPI CPU","dev.cpu.0.temperature":"30.0C","dev.cpu.0.coretemp.throttle_log":"0","dev.cpu.0.coretemp.tjmax":"105.0C","dev.cpu.0.coretemp.resolution":"1","dev.cpu.0.coretemp.delta":"75","dev.cpu.0.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.0.cx_usage_counters":"15920 135827 46052","dev.cpu.0.cx_usage":"8.04% 68.66% 23.28% last 399us","dev.cpu.0.cx_lowest":"C3","dev.cpu.0.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.0.freq_levels":"806\/-1","dev.cpu.0.freq":"3218","dev.cpu.0.%parent":"acpi0","dev.cpu.0.%pnpinfo":"_HID=ACPI0007 _UID=0 _CID=none","dev.cpu.0.%location":"handle=\\_SB_.PR00","dev.cpu.0.%driver":"cpu","dev.cpu.0.%desc":"ACPI CPU","dev.cpu.%parent":""}


I am not having any troubles with the temprature.
Only posting this because Patrick asked.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 04, 2024, 07:38:53 PM
I have the issue.

Several days ago I had run the test script as mentioned here: https://github.com/opnsense/core/pull/7758#issuecomment-2289846002

I ran it 4 times and the largest difference noted was +15C above the average of the measurements without 'sysctl -a' . 

Average +7.75C across 4 consecutive runs.

Attaching screenshot.

Protectli V1410 (Intel N5105)

Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 07:51:27 PM
> root@router:~ # sysctl dev.cpu.0.temperature
> dev.cpu.0.temperature: 26.0C
> root@OPNsense:~ # configctl system sysctl values dev.cpu
> [...]dev.cpu.0.temperature":"52.0C"[...]

The funny thing is if we cannot use configd to catch these values we basically cannot fetch these values anyway or is reading the dev.cpu stuff so CPU intense??? Only one way to find out:

# configctl system sysctl values dev.cpu.0.temperature


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 04, 2024, 07:55:35 PM
Quote from: meyergru on December 04, 2024, 06:51:15 PM
BTW: Do not put the commands in a loop and leave at least 1 second between calls - for  reasons that now should be obvious.

Ok...


root@firewall:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 50.0C
root@firewall:~ # sysctl dev.cpu | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 45.0C
root@firewall:~ # sysctl -a | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 53.0C
root@firewall:~ # configctl system sysctl values dev.cpu
{"dev.cpu.3.temperature":"49.0C","dev.cpu.3.coretemp.throttle_log":"0","dev.cpu.3.coretemp.tjmax":"105.0C","dev.cpu.3.coretemp.resolution":"1","dev.cpu.3.coretemp.delta":"55","dev.cpu.3.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.3.cx_usage_counters":"19756783 0 0","dev.cpu.3.cx_usage":"100.00% 0.00% 0.00% last 1346us","dev.cpu.3.cx_lowest":"C1","dev.cpu.3.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/253","dev.cpu.3.freq_levels":"2001\/10000 2000\/10000 1800\/8793 1600\/7632 1400\/6524 1200\/5466 1000\/4445 800\/3472","dev.cpu.3.freq":"2001","dev.cpu.3.%parent":"acpi0","dev.cpu.3.%pnpinfo":"_HID=ACPI0007 _UID=3 _CID=none","dev.cpu.3.%location":"handle=\\_SB_.CP03","dev.cpu.3.%driver":"cpu","dev.cpu.3.%desc":"ACPI CPU","dev.cpu.2.temperature":"50.0C","dev.cpu.2.coretemp.throttle_log":"0","dev.cpu.2.coretemp.tjmax":"105.0C","dev.cpu.2.coretemp.resolution":"1","dev.cpu.2.coretemp.delta":"55","dev.cpu.2.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.2.cx_usage_counters":"20624333 0 0","dev.cpu.2.cx_usage":"100.00% 0.00% 0.00% last 210us","dev.cpu.2.cx_lowest":"C1","dev.cpu.2.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/253","dev.cpu.2.freq_levels":"2001\/10000 2000\/10000 1800\/8793 1600\/7632 1400\/6524 1200\/5466 1000\/4445 800\/3472","dev.cpu.2.freq":"2001","dev.cpu.2.%parent":"acpi0","dev.cpu.2.%pnpinfo":"_HID=ACPI0007 _UID=2 _CID=none","dev.cpu.2.%location":"handle=\\_SB_.CP02","dev.cpu.2.%driver":"cpu","dev.cpu.2.%desc":"ACPI CPU","dev.cpu.1.temperature":"50.0C","dev.cpu.1.coretemp.throttle_log":"0","dev.cpu.1.coretemp.tjmax":"105.0C","dev.cpu.1.coretemp.resolution":"1","dev.cpu.1.coretemp.delta":"55","dev.cpu.1.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.1.cx_usage_counters":"19203038 0 0","dev.cpu.1.cx_usage":"100.00% 0.00% 0.00% last 13948us","dev.cpu.1.cx_lowest":"C1","dev.cpu.1.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/253","dev.cpu.1.freq_levels":"2001\/10000 2000\/10000 1800\/8793 1600\/7632 1400\/6524 1200\/5466 1000\/4445 800\/3472","dev.cpu.1.freq":"2001","dev.cpu.1.%parent":"acpi0","dev.cpu.1.%pnpinfo":"_HID=ACPI0007 _UID=1 _CID=none","dev.cpu.1.%location":"handle=\\_SB_.CP01","dev.cpu.1.%driver":"cpu","dev.cpu.1.%desc":"ACPI CPU","dev.cpu.0.temperature":"50.0C","dev.cpu.0.coretemp.throttle_log":"0","dev.cpu.0.coretemp.tjmax":"105.0C","dev.cpu.0.coretemp.resolution":"1","dev.cpu.0.coretemp.delta":"54","dev.cpu.0.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.0.cx_usage_counters":"169953345 0 0","dev.cpu.0.cx_usage":"100.00% 0.00% 0.00% last 76us","dev.cpu.0.cx_lowest":"C1","dev.cpu.0.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/253","dev.cpu.0.freq_levels":"2001\/10000 2000\/10000 1800\/8793 1600\/7632 1400\/6524 1200\/5466 1000\/4445 800\/3472","dev.cpu.0.freq":"2001","dev.cpu.0.%parent":"acpi0","dev.cpu.0.%pnpinfo":"_HID=ACPI0007 _UID=0 _CID=none","dev.cpu.0.%location":"handle=\\_SB_.CP00","dev.cpu.0.%driver":"cpu","dev.cpu.0.%desc":"ACPI CPU","dev.cpu.%parent":""}
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 08:14:55 PM
@OPNenthu in contrast, your numbers look like they have higher standard deviation which makes them less trustworthy in the first place? We were discussing averaging the temperatures before which is a lot of effort for ironing out specific hardware design issues. I simply hope you understand why we circle back and forth here.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 04, 2024, 08:21:39 PM
Right. To quote Yoda: "Temperature resistance is strong with OPNenthu."

If temperature jumps of 15°C can occur within what seems to be just milliseconds, I would argue that the thermal transfer from die to heatsink must be sub-optimal - and that's a euphimism.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 08:27:39 PM
Besides the jump the numbers are odd:

> root@firewall:~ # sysctl dev.cpu.0.temperature
> dev.cpu.0.temperature: 50.0C

fastest call, second largest temp

> root@firewall:~ # sysctl -a | fgrep dev.cpu.0.temperature
> dev.cpu.0.temperature: 53.0C

slowest call, largest temp, but only 3 degree more than fastest call?

> root@firewall:~ # sysctl dev.cpu | fgrep dev.cpu.0.temperature
> dev.cpu.0.temperature: 45.0C

Somewhat fast, lowest temp

> # configctl system sysctl values dev.cpu
> [...]"dev.cpu.0.temperature":"50.0C"[...]

Somewhat slower, second highest temp

Maybe the test is flawed anyway because you cannot easily see which CPU served the temps and asked the kernel for the counters...
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 04, 2024, 08:32:19 PM
When the temperature gradients are that steep, you cannot discriminate between the load induced be the test itself and anything that happens in the background... there is no inertia.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 04, 2024, 09:09:08 PM
Got it.  Seems I have an outlier.

I don't discount the possibility that my ambient is not ideal (this room fluctuates a lot and can get warm), but what you guys are saying is that it doesn't matter.  The transfer from die to sink is the problem here, not sink to air.

Protectli suggested me to buy a USB fan but I'm not happy with that because the fanless operation was the main selling point (and also coreboot UEFI).  Before I go asking for a replacement, I have 2 questions:

1) Are others with the same/similar device seeing the same issue?

2) If Protectli insists that they are seeing the issue across all their devices with OPNsense only, does this imply a problem with their manufacturing process overall?

I may need a separate thread to not pollute the OP's topic.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 04, 2024, 09:19:19 PM
> The transfer from die to sink is the problem here, not sink to air.

I thought that was clear from the fact we measure die temperatures after all?


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 04, 2024, 09:58:48 PM
Quote from: franco on December 04, 2024, 09:19:19 PM
> The transfer from die to sink is the problem here, not sink to air.

I thought that was clear from the fact we measure die temperatures after all?

Yes, but I also thought it was clear that those are influenced by both measurement (this topic) and heat dissipation (my particular issue).

I'm only asking if there is enough information here to be certain that my particular issue is not due to ambient conditions but to a manufacturing defect.  But nevermind, I think it's opening a can of worms that distract from the main topic.  If anyone can help me sort this out please PM me, so that I can confidently raise a support request to Protectli.  I'm not yet convinced that I need to. Thank you.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 05, 2024, 08:07:51 AM
I might have found the culprit, or at least a contributor.

The Protectli V1410 comes with eMMC onboard but it also has an NVME storage option.  I installed a Kingston NV2 256GB into it as this drive is sold/recommended by Protectli also, and they provided thermal pads for me to install the SSD.

I don't think I can measure the eMMC temperature but the NVME is visible and it's rather warm in this system.  Maybe it's contributing to the deviations observed by @franco and @meyergru from my earlier numbers? 

I'm thinking that Protectli quality control would be catching issues of poor thermal interface/paste, and they do claim to bench test each device before shipping.  It's part of what you get purchasing from them rather than AliExpress... and I'm sure the same goes for Deciso in Europe.

Maybe the SSD is taking some of the heatsink capacity that would otherwise go to the CPU.

Linux 'lm_sensors' output (NVME top, CPU bottom):


[root@localhost-live ~]# sensors
nvme-pci-0300
Adapter: PCI adapter
Composite:    +44.9°C  (low  =  -0.1°C, high = +76.8°C)
                       (crit = +78.8°C)
ERROR: Can't get value of subfeature temp3_min: I/O error
ERROR: Can't get value of subfeature temp3_max: I/O error
Sensor 2:     +51.9°C  (low  =  +0.0°C, high =  +0.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +48.0°C  (high = +105.0°C, crit = +105.0°C)
Core 0:        +40.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:        +40.0°C  (high = +105.0°C, crit = +105.0°C)
Core 2:        +40.0°C  (high = +105.0°C, crit = +105.0°C)
Core 3:        +40.0°C  (high = +105.0°C, crit = +105.0°C)


FreeBSD 'nvmecontrol' (temperature sensor 2 near bottom):


root@firewall:/dev # nvmecontrol logpage -p 2 nvme0
SMART/Health Information Log
============================
Critical Warning State:         0x00
Available spare:               0
Temperature:                   0
Device reliability:            0
Read only:                     0
Volatile memory backup:        0
Temperature:                    318 K, 44.85 C, 112.73 F
Available spare:                100
Available spare threshold:      10
Percentage used:                0
Data units (512,000 byte) read: 94993
Data units written:             1627689
Host read commands:             3113795
Host write commands:            24545023
Controller busy time (minutes): 218
Power cycles:                   41
Power on hours:                 1361
Unsafe shutdowns:               10
Media errors:                   0
No. error info log entries:     20
Warning Temp Composite Time:    0
Error Temp Composite Time:      0
Temperature Sensor 2:           325 K, 51.85 C, 125.33 F
Temperature 1 Transition Count: 0
Temperature 2 Transition Count: 0
Total Time For Temperature 1:   0
Total Time For Temperature 2:   0


Comparing the 'lm_sensors' and 'sysctl dev.cpu' outputs from earlier, I feel somewhat confident that my real idle core temperature is in the 40-45C range.  It may become even less if I were to remove the NVME drive, but I'm not sure there is a real need.  I think these temperatures are still within spec-- but I'd appreciate some confirmation from others.

Coming back to the topic, I vote yes for 'sysctl dev.cpu' IF there is a way to implement it without sacrificing sensors that others depend on.  This method seems to make a difference even for me.

I vote no on the proposal for temperature averaging.  That would just give me a skewed average I guess.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 05, 2024, 10:01:51 AM
"It does not work like that":

The connection between the CPU die and the NVME die is through two thermal bridges connecting both to the case at best, while I have seen cases where the NVME did not even touch it.

So, this connection is even more remote than the connection between the CPU die and most of the thermal "mass" of the case, so to speak. The fluctuations that you see suggest a bad transition between the CPU die and the case, so, if at all, the NVME heat only contributes an offset, not a fast fluctuation. Mind you, the temperature is measures directly on the CPU die.

So, sure, additional wattage by the NVME makes up for a higher general temperature level, but not for fast changes in CPU temps.

I will keep my mouth shut about Protecli, but I think they are not even comparable to Deciso.

Quote from: OPNenthu on December 05, 2024, 08:07:51 AM
Coming back to the topic, I vote yes for 'sysctl dev.cpu' IF there is a way to implement it without sacrificing sensors that others depend on.  This method seems to make a difference even for me.

Reducing output to "dev.cpu" in order to keep processing cost (and heat builtup) during measurement automatically limits it to that sysctl subtree, excluding outliers. So your IF can only be satisfied if there was a way to specify a list of other prefixes in order to include exotics.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 05, 2024, 10:31:57 AM
Quote from: meyergru on December 05, 2024, 10:01:51 AM
"It does not work like that":

The connection between the CPU die and the NVME die is through two thermal bridges connecting both to the case at best, while I have seen cases where the NVME did not even touch it.

So, this connection is even more remote than the connection between the CPU die and most of the thermal "mass" of the case, so to speak. The fluctuations that you see suggest a bad transition between the CPU die and the case, so, if at all, the NVME heat only contributes an offset, not a fast fluctuation. Mind you, the temperature is measures directly on the CPU die.

So, sure, additional wattage by the NVME makes up for a higher general temperature level, but not for fast changes in CPU temps.

Good point here. 

I'm noticing though that I'm not the only person with an N5105 that is complaining about high temps.  There's another one (https://forum.opnsense.org/index.php?topic=44442.0) in the 'Hardware & Performance' section, and another on Amazon reviews. 

How can I be sure that I am not an outlier relative to all the other N5105 users?  If I can show that my system is behaving erratically relative to all the others in its category then I think I could make a case for a replacement.

Maybe we need a 5105 owners thread to see if anyone else is having similar quick temperature transitions.


Quote
I will keep my mouth shut about Protecli, but I think they are not even comparable to Deciso.
Of course, not even in the same market or price segment.  One doesn't even sell rackmount gear.

Both companies have quality controls and stand behind with warranties and support, though.  That's my point.  You take a gamble with the AliExpress specials. :)

I'll say no more.  I'm a fan of both and would buy a Deciso if my budget allowed.

Quote
Reducing output to "dev.cpu" in order to keep processing cost (and heat builtup) during measurement automatically limits it to that sysctl subtree, excluding outliers. So your IF can only be satisfied if there was a way to specify a list of other prefixes in order to include exotics.

There was a proposal on GitHub to save the list of sensors from 'sysctl -a' at startup, as a one-time call.  From then on it would be possible to call 'sysctl' on them periodically.  It was not well received.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 05, 2024, 10:39:30 AM
@OPNenthu

I have a protectli, but like mentioned its a different model. The VP6000 Series is almost brand new and shipped with two fans.
My 6630 has a complete different behavior with a Linux installed and using lm-sensors. In my case I can say the difference is like 30-40°C compared with the output I get from the dashboard.
Even the output of "sysctl dev.cpu | grep temperature" is far away from this peaks.

https://protectli.com/wp-content/uploads/2024/07/VP6630-Datasheet-20240628.pdf
Page 8 you can see the Mainboard. #28 would be the place for the NVME (I'm using a INTENSO SSD with SLC). There is an additional heatsink with a thermal pad mounted in my case.

On top of that I got a replacement part from Protectli. The first 36-48hours the Dashboard showed lower values in comparison to the other machine. After that it reached as well the 80-82°C on the dashboard. So to summarize. Both devices have the same behavior after ~2+ days

My environmental temperature is monitored by different Sensors. Just to name some: BME680, BME280 and some others.

The NVME is monitored as well:
E.g
Temperature:                        36 Celsius

Room_Temperature right now:  21,52 °C (increasing)
Also this temperature is far away from having an impact on the temperature of the FW. In a high computing power pc build, the NVME temperatures are even similar. And there the heatsink is "MASSIV" (Gigabyte X670 Aorus Master)

Let's see how it goes in the summer :D
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: Patrick M. Hausen on December 05, 2024, 10:45:38 AM
Ha! Found a data point: Protectli FW4B

root@opnsense:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 50.0C
root@opnsense:~ # sysctl -a | grep dev.cpu.0.temperature
dev.cpu.0.temperature: 54.0C
root@opnsense:~ # configctl system sysctl values dev.cpu
[...] 51.0C

Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 05, 2024, 11:01:45 AM
My N5105 china box (by Topton, the darker one from this test (https://www.servethehome.com/two-fanless-intel-celeron-n5105-4x-2-5gbe-options-reviewed/), dmidecode shows CW-N6000, so Changwang/CWWK is the manufacturer):


[root@OPNsense.jmg]# sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 43.0C
[root@OPNsense.jmg]# sysctl dev.cpu | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 44.0C
[root@OPNsense.jmg]# sysctl -a | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 46.0C
[root@OPNsense.jmg]# configctl system sysctl values dev.cpu
...
"dev.cpu.0.temperature":"46.0C"
...


I always check the thermal paste on those boxes if I see fluctuating temps and fix it (https://www.congenio.de/infos/opnsense-hardware.html).

Quote from: OPNenthu on December 05, 2024, 10:31:57 AM
There was a proposal on GitHub to save the list of sensors from 'sysctl -a' at startup, as a one-time call.  From then on it would be possible to call 'sysctl' on them periodically.  It was not well received.

I know, I made that proposal. What I wanted to stress is that the "easy (reduction) method" Franco prefers cannot find all potential sensors.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: franco on December 05, 2024, 11:12:53 AM
All I tried to achieve back then is to not lose track of the idea of the temperature widget to show relevant temperature sensors that might be active in the system. We can dial back the scope of the widget to dev.cpu, but first we need to make sure the lookup is sensible in what it tries to achieve: "show more accurate heat readings for the average case without spinning the CPU too much so that it skews the reading".


Cheers,
Franco
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 05, 2024, 11:27:54 AM
And for now my impression is like the usual case is a 2-3°C delta and up to 15°C for cases where heat transfer is problematic.

I wonder if it is better to keep the old way of doing it and explaining users that if they observe a big difference, they should inspect their cooling  ;)
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: fastboot on December 05, 2024, 12:35:38 PM
Quote from: meyergru on December 05, 2024, 11:27:54 AM
And for now my impression is like the usual case is a 2-3°C delta and up to 15°C for cases where heat transfer is problematic.

I wonder if it is better to keep the old way of doing it and explaining users that if they observe a big difference, they should inspect their cooling  ;)

Maybe it's just me, maybe I am in the wrong mood at this moment. But sometimes I have the impression you think that other users are stupid.

In this regard I can only speak for myself for sure. Actually I precisely know what I am doing. I know my hardware, and I know my tools. If not, I put time and effort in it to get a deep knowledge of the things I work with.

But to make it very short: There is no issue with the cooling in my devices. If it would, it would have been fixed already.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: _tribal_ on December 05, 2024, 01:15:14 PM
Quote from: OPNenthu on December 05, 2024, 10:31:57 AM

Maybe we need a 5105 owners thread to see if anyone else is having similar quick temperature transitions.
I have exactly the same behavior on my N5105 since upgrading to 24.x.  >:(All my questions were answered with assurances that I was looking at the temperature in the wrong way and everything is correct now.  :'( I'm already desperate to explain anything to the developers and just subtract 10-15 degrees in my mind when I look at the temperature graphs ::)
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: _tribal_ on December 05, 2024, 01:25:02 PM
Quote from: meyergru on December 05, 2024, 11:27:54 AM
And for now my impression is like the usual case is a 2-3°C delta and up to 15°C for cases where heat transfer is problematic.
if only. Unfortunately the difference is floating, but more often it is in +10 gr. No problems with heat transfer, the system passed a stress test lasting more than a week.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 05, 2024, 04:02:25 PM
Maybe it is my inability to explain this more clearly, so for that last time:

When there is a bad heat transfer because of bad thermal paste or too small of a small contact patch, you will experience short spikes of CPU die temperatures, because the heat cannot be soaked up by the mass of the case or heatsink immediately. After a while, the heat WILL eventually be transferred anyway, because there is still is no vacuum, it is just a bad transfer medium causing the delay.

Thus, short bursts of CPU activity will heat up the die with bad heat transfer much faster. That is exactly what is happening with the current measurement method.

It says nothing about long term stability under stress. If you put a continuous load on the CPU, the resulting maximum temperature will not even be higher with bad transfer (with the same power limit and thermal capacity of the case/heatsink, of course), so you cannot compare those.

I have 4 those china boxes of which 2 had this problem, 2 did not. After fixing it, I see spikes of 2-3 degrees on all of these systems during measurements, as does Patrick.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: _tribal_ on December 05, 2024, 06:31:04 PM
I know exactly what you mean. But the "box" was serviced upon reception and the thermal paste was replaced with Arctic Cooling MX4, which is more than enough in this case. And if everything was as you write, there would be a difference on the old version too, i.e. the average temperature would be higher, but the mentioned increase happened exclusively after switching to the new way of temperature reading. Actually, for me it is not so critical, but I am not the only one who noticed such behavior. I perfectly understand that I have no right to demand anything from a free product and the developers in any case will do as they want, it's just not very convenient - to keep a correction of 7-10 degrees in your head when you look at the temperature graph. That's all.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 06, 2024, 09:08:11 AM
Quote from: fastboot on December 05, 2024, 10:39:30 AM
The first 36-48hours the Dashboard showed lower values in comparison to the other machine. After that it reached as well the 80-82°C on the dashboard. So to summarize. Both devices have the same behavior after ~2+ days
Yours is a different CPU architecture (Core i3) and includes fans, so we are comparing apples and oranges.  However I wanted to mention something I learned about my own Vault (1410) from the support chat.  He said that given enough time, and if the Vault is generating more heat than can be dissipated into the environment, it will regulate itself to 60C.

I don't see that is happening for me (my idle temps are in the 40-45C range, as confirmed by both Linux and FreeBSD utilities) but in your case maybe there is some kind of regulation like this happening which explains why your temps are settling at higher values after 2 days.  The differences you mentioned in your post seem high to me but I'm not familiar enough with your particular model to say if those are typical.  Maybe it's worth opening a ticket to see what they say.

Quote from: meyergru on December 05, 2024, 11:27:54 AM
I wonder if it is better to keep the old way of doing it and explaining users that if they observe a big difference, they should inspect their cooling  ;)
I don't know if this is good advice for those of us with active warranties.  Doing surgery to check and re-paste sounds like a potential way to void it.

Quote from: _tribal_ on December 05, 2024, 06:31:04 PM
[...] but the mentioned increase happened exclusively after switching to the new way of temperature reading.

I wonder if the change is in FreeBSD, because there is one report of high temperature using the other firewall (https://www.amazon.com/gp/customer-reviews/RVENTGS65JVNT?ASIN=B0D8CR8LHJ). 

The casual observer might look at this and come away with the impression that this is a characteristic of these devices, but Protectli disagree.  To quote the support tech I spoke to:

Quote
Last year we got a bunch of complains all fairly close together so we set up LM Sensors in another VM and saw a 10C skew from what OPNsense reports. [...] we saw reports of it happening on many other brands and it happens on all of ours so we chalked it up to "the way OPNsense does things".

So the issue is not vendor specific, and the reports are clustered in time (if this is to be believed).

I only started using OPNsense with 2.47 myself, so I didn't get to experience the "before" and "after" effect.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: MenschAergereDichNicht on December 06, 2024, 02:03:45 PM
System: Protectli VP2420 (Celeron J6412)

Widget Temperature: 56°C
Reporting: ~50°C
sysctl dev.cpu.0.temperature: dev.cpu.0.temperature: 42.0C
sysctl -a | grep dev.cpu.0.temperature: dev.cpu.0.temperature: 52.0C

I am using the hwp_state driver and not powerd. Don't know if this is relevant.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: passeri on December 06, 2024, 10:15:17 PM
While we are having fun posting temperatures, on a passively-cooled Yanling 6-port i7 box I see right now:
sysctl dev.cpu.0.temperature = 44°C
sysctl -a | grep dev.cpu.0.temperature = 55°C
GUI : 57-61°C for CPU 7 to 0, Zones A & B at 65°C

All is within specification, the ambient is maintained at a ceiling of 27-28°C (26 at measurement), so I have no particular interest in which one is "right" although I trust the first one. The differences in measurement are certainly there, from the method and from the GUI.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: OPNenthu on December 07, 2024, 03:32:57 AM
Slightly longer test with a Linux live USB.  'lm_sensors' output collected every 5 seconds:


$ dnf install lm_sensors
$ watch -n 5 sensors | tee --append data.txt


By default Fedora 41 has 'firewalld' service and Gnome running.  Not much else going on for the first 10 min. so that I could get a baseline.

At the ~10 min. mark, I did some light workload activities... launching and configuring Firefox, running online speedtest, launching LibreOffice.

The results (attached) shows that both things are true:

1. The baseline temperature is substantially less than reported by 'sysctl -a'
2. The temperature rises sharply on any kind of burst activity

A subjective observation I made:  the box feels a lot cooler to the touch running under Linux.  The baseline is even lower than reported by 'sysctl dev.cpu.0.temperature' in FreeBSD (although this could be due to all the services running in OPNsense.)

If anyone wants to try, feel free to modify the python script attached for your particular 'lm_sensors' output.  It will be useful for me to compare notes, especially with similar devices.
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: switchman on December 10, 2024, 10:38:30 AM
tried these two commands on my system shell command line and the dev.cpu does not return any results.  Just throwing it out there, please don't break all installs.

root@OPNsense:~ # sysctl dev.cpu | grep temperature | sort
root@OPNsense:~ # sysctl -a | grep temperature | sort
hw.acpi.thermal.tz0.temperature: 27.9C
hw.acpi.thermal.tz1.temperature: 29.9C
root@OPNsense:~ #
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: meyergru on December 10, 2024, 11:53:14 AM
Did you even activate any thermal sensor under "System: Settings: Miscellaneous"? Probably not.

Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: Patrick M. Hausen on December 10, 2024, 11:53:36 AM
@switchman did you configure the correct thermal sensor for your hardware at System > Settings > Miscellaneous > Thermal Sensors?
Title: Re: Temperature: Dashboard Temps differ massively from CLI
Post by: switchman on December 11, 2024, 07:15:09 AM
No.  I have now and it works fine.  Thank You.