Temperature: Dashboard Temps differ massively from CLI

Started by fastboot, December 01, 2024, 02:15:53 PM

Previous topic - Next topic
December 04, 2024, 06:51:15 PM #30 Last Edit: December 04, 2024, 08:24:40 PM by meyergru
To put it into "user-understandable" words:

The reading of the temperature, when done via "sysctl dev.cpu | grep temperature" puts less stress on "some" CPUs than doing "sysctl -a | fgrep temperature" (which is currently done by the dashboard).

That this causes a higher temperature readout is somewhat new and still does not occur with some CPUs. Why? Because if the heat-dissipation solution is good (tm), then the heat will quickly be taken off the die and into the case or air or whatever. With an air-gap or bad thermal paste, heat builds up on the die and can only slowly dissipate.

However, since modern CPUs tend to use more of the available range of potential die temperatures and with the advent of badly designed (and sometimes built) china boxes, the dissipation may be much worse, such that even short bursts of CPU load (like those induced by "sysctl -a | fgrep temperature") heat up the CPU in "no time".
The situation is not improved when manufacturers turn up the power limits of these 6 Watt CPUs to 25 Watts or more per default.

This gets mostly unnoticed, because thermal throttling will keep the CPU from melting anyway. Also note, that the temperature that is being read is indeed "correct" - there is no denying that. It is merely shown somewhat higher than without reading it from ~15000 oids. Call it a heisenbug.

So, this explains why the problem is "new", despite the age-old code doing this. It also explains why Patrick and me do not see much difference between the readout with "sysctl dev.cpu | grep temperature" and "sysctl -a | fgrep temperature".

The remaining question is: How much of a difference is there for those of you who complain about "wrong readings"? BTW: Do not put the commands in a loop and leave at least 1 second between calls - for  reasons that now should be obvious.

Let's start: For me on an N100 with decent cooling and new thermal paste, the difference is ~3°C:


root@OPNsense:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 50.0C
root@OPNsense:~ # sysctl dev.cpu | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 50.0C
root@OPNsense:~ # sysctl -a | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 52.0C
root@OPNsense:~ # configctl system sysctl values dev.cpu
{"dev.cpu.3.temperature":"53.0C","dev.cpu.3.coretemp.throttle_log":"0","dev.cpu.3.coretemp.tjmax":"105.0C","dev.cpu.3.coretemp.resolution":"1","dev.cpu.3.coretemp.delta":"52","dev.cpu.3.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.3.cx_usage_counters":"1279770 3676798 125303","dev.cpu.3.cx_usage":"25.18% 72.35% 2.46% last 102us","dev.cpu.3.cx_lowest":"C3","dev.cpu.3.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.3.freq_levels":"806\/-1","dev.cpu.3.freq":"2025","dev.cpu.3.%parent":"acpi0","dev.cpu.3.%pnpinfo":"_HID=ACPI0007 _UID=3 _CID=none","dev.cpu.3.%location":"handle=\\_SB_.PR03","dev.cpu.3.%driver":"cpu","dev.cpu.3.%desc":"ACPI CPU","dev.cpu.2.temperature":"51.0C","dev.cpu.2.coretemp.throttle_log":"0","dev.cpu.2.coretemp.tjmax":"105.0C","dev.cpu.2.coretemp.resolution":"1","dev.cpu.2.coretemp.delta":"54","dev.cpu.2.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.2.cx_usage_counters":"1710877 4995190 134535","dev.cpu.2.cx_usage":"25.01% 73.02% 1.96% last 148us","dev.cpu.2.cx_lowest":"C3","dev.cpu.2.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.2.freq_levels":"806\/-1","dev.cpu.2.freq":"2100","dev.cpu.2.%parent":"acpi0","dev.cpu.2.%pnpinfo":"_HID=ACPI0007 _UID=2 _CID=none","dev.cpu.2.%location":"handle=\\_SB_.PR02","dev.cpu.2.%driver":"cpu","dev.cpu.2.%desc":"ACPI CPU","dev.cpu.1.temperature":"52.0C","dev.cpu.1.coretemp.throttle_log":"0","dev.cpu.1.coretemp.tjmax":"105.0C","dev.cpu.1.coretemp.resolution":"1","dev.cpu.1.coretemp.delta":"53","dev.cpu.1.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.1.cx_usage_counters":"1170285 4249538 131505","dev.cpu.1.cx_usage":"21.08% 76.54% 2.36% last 255us","dev.cpu.1.cx_lowest":"C3","dev.cpu.1.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.1.freq_levels":"806\/-1","dev.cpu.1.freq":"2117","dev.cpu.1.%parent":"acpi0","dev.cpu.1.%pnpinfo":"_HID=ACPI0007 _UID=1 _CID=none","dev.cpu.1.%location":"handle=\\_SB_.PR01","dev.cpu.1.%driver":"cpu","dev.cpu.1.%desc":"ACPI CPU","dev.cpu.0.temperature":"52.0C","dev.cpu.0.coretemp.throttle_log":"0","dev.cpu.0.coretemp.tjmax":"105.0C","dev.cpu.0.coretemp.resolution":"1","dev.cpu.0.coretemp.delta":"53","dev.cpu.0.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.0.cx_usage_counters":"3142633 4968723 737","dev.cpu.0.cx_usage":"38.74% 61.25% 0.00% last 186us","dev.cpu.0.cx_lowest":"C3","dev.cpu.0.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.0.freq_levels":"806\/-1","dev.cpu.0.freq":"2001","dev.cpu.0.%parent":"acpi0","dev.cpu.0.%pnpinfo":"_HID=ACPI0007 _UID=0 _CID=none","dev.cpu.0.%location":"handle=\\_SB_.PR00","dev.cpu.0.%driver":"cpu","dev.cpu.0.%desc":"ACPI CPU","dev.cpu.%parent":""}


The delta is about the same on an N5105 and a J4125. If your delta is much higher, you should probably fix your thermal paste or lower your BIOS power limits.

P.S.: I have no idea what stress the configctl method would put on the CPU...

P.P.S: On another note: After an update has been done, there sometimes are house-keeping or even runaway tasks that raise temps, maybe that is another factor why people keep telling that the new dashboard shows "wrong" (i.e. higher) temperatures than before the update.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

December 04, 2024, 06:52:10 PM #31 Last Edit: December 04, 2024, 06:56:22 PM by staticznld
Hunsn N100 4x i226-v



root@router:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 26.0C
root@router:~ # sysctl dev.cpu | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 27.0C
root@router:~ # sysctl -a | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 30.0C
root@router:~ # configctl system sysctl values dev.cpu
{"dev.cpu.3.temperature":"32.0C","dev.cpu.3.coretemp.throttle_log":"0","dev.cpu.3.coretemp.tjmax":"105.0C","dev.cpu.3.coretemp.resolution":"1","dev.cpu.3.coretemp.delta":"73","dev.cpu.3.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.3.cx_usage_counters":"9038 206303 33787","dev.cpu.3.cx_usage":"3.62% 82.81% 13.56% last 1671us","dev.cpu.3.cx_lowest":"C3","dev.cpu.3.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.3.freq_levels":"806\/-1","dev.cpu.3.freq":"3407","dev.cpu.3.%parent":"acpi0","dev.cpu.3.%pnpinfo":"_HID=ACPI0007 _UID=3 _CID=none","dev.cpu.3.%location":"handle=\\_SB_.PR03","dev.cpu.3.%driver":"cpu","dev.cpu.3.%desc":"ACPI CPU","dev.cpu.2.temperature":"31.0C","dev.cpu.2.coretemp.throttle_log":"0","dev.cpu.2.coretemp.tjmax":"105.0C","dev.cpu.2.coretemp.resolution":"1","dev.cpu.2.coretemp.delta":"74","dev.cpu.2.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.2.cx_usage_counters":"8177 209330 33176","dev.cpu.2.cx_usage":"3.26% 83.50% 13.23% last 804us","dev.cpu.2.cx_lowest":"C3","dev.cpu.2.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.2.freq_levels":"806\/-1","dev.cpu.2.freq":"3407","dev.cpu.2.%parent":"acpi0","dev.cpu.2.%pnpinfo":"_HID=ACPI0007 _UID=2 _CID=none","dev.cpu.2.%location":"handle=\\_SB_.PR02","dev.cpu.2.%driver":"cpu","dev.cpu.2.%desc":"ACPI CPU","dev.cpu.1.temperature":"31.0C","dev.cpu.1.coretemp.throttle_log":"0","dev.cpu.1.coretemp.tjmax":"105.0C","dev.cpu.1.coretemp.resolution":"1","dev.cpu.1.coretemp.delta":"74","dev.cpu.1.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.1.cx_usage_counters":"95908 1201303 634","dev.cpu.1.cx_usage":"7.38% 92.56% 0.04% last 76us","dev.cpu.1.cx_lowest":"C3","dev.cpu.1.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.1.freq_levels":"806\/-1","dev.cpu.1.freq":"3426","dev.cpu.1.%parent":"acpi0","dev.cpu.1.%pnpinfo":"_HID=ACPI0007 _UID=1 _CID=none","dev.cpu.1.%location":"handle=\\_SB_.PR01","dev.cpu.1.%driver":"cpu","dev.cpu.1.%desc":"ACPI CPU","dev.cpu.0.temperature":"30.0C","dev.cpu.0.coretemp.throttle_log":"0","dev.cpu.0.coretemp.tjmax":"105.0C","dev.cpu.0.coretemp.resolution":"1","dev.cpu.0.coretemp.delta":"75","dev.cpu.0.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.0.cx_usage_counters":"15920 135827 46052","dev.cpu.0.cx_usage":"8.04% 68.66% 23.28% last 399us","dev.cpu.0.cx_lowest":"C3","dev.cpu.0.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/1048","dev.cpu.0.freq_levels":"806\/-1","dev.cpu.0.freq":"3218","dev.cpu.0.%parent":"acpi0","dev.cpu.0.%pnpinfo":"_HID=ACPI0007 _UID=0 _CID=none","dev.cpu.0.%location":"handle=\\_SB_.PR00","dev.cpu.0.%driver":"cpu","dev.cpu.0.%desc":"ACPI CPU","dev.cpu.%parent":""}


I am not having any troubles with the temprature.
Only posting this because Patrick asked.

December 04, 2024, 07:38:53 PM #32 Last Edit: December 04, 2024, 07:41:26 PM by OPNenthu
I have the issue.

Several days ago I had run the test script as mentioned here: https://github.com/opnsense/core/pull/7758#issuecomment-2289846002

I ran it 4 times and the largest difference noted was +15C above the average of the measurements without 'sysctl -a' . 

Average +7.75C across 4 consecutive runs.

Attaching screenshot.

Protectli V1410 (Intel N5105)


> root@router:~ # sysctl dev.cpu.0.temperature
> dev.cpu.0.temperature: 26.0C
> root@OPNsense:~ # configctl system sysctl values dev.cpu
> [...]dev.cpu.0.temperature":"52.0C"[...]

The funny thing is if we cannot use configd to catch these values we basically cannot fetch these values anyway or is reading the dev.cpu stuff so CPU intense??? Only one way to find out:

# configctl system sysctl values dev.cpu.0.temperature


Cheers,
Franco

Quote from: meyergru on December 04, 2024, 06:51:15 PM
BTW: Do not put the commands in a loop and leave at least 1 second between calls - for  reasons that now should be obvious.

Ok...


root@firewall:~ # sysctl dev.cpu.0.temperature
dev.cpu.0.temperature: 50.0C
root@firewall:~ # sysctl dev.cpu | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 45.0C
root@firewall:~ # sysctl -a | fgrep dev.cpu.0.temperature
dev.cpu.0.temperature: 53.0C
root@firewall:~ # configctl system sysctl values dev.cpu
{"dev.cpu.3.temperature":"49.0C","dev.cpu.3.coretemp.throttle_log":"0","dev.cpu.3.coretemp.tjmax":"105.0C","dev.cpu.3.coretemp.resolution":"1","dev.cpu.3.coretemp.delta":"55","dev.cpu.3.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.3.cx_usage_counters":"19756783 0 0","dev.cpu.3.cx_usage":"100.00% 0.00% 0.00% last 1346us","dev.cpu.3.cx_lowest":"C1","dev.cpu.3.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/253","dev.cpu.3.freq_levels":"2001\/10000 2000\/10000 1800\/8793 1600\/7632 1400\/6524 1200\/5466 1000\/4445 800\/3472","dev.cpu.3.freq":"2001","dev.cpu.3.%parent":"acpi0","dev.cpu.3.%pnpinfo":"_HID=ACPI0007 _UID=3 _CID=none","dev.cpu.3.%location":"handle=\\_SB_.CP03","dev.cpu.3.%driver":"cpu","dev.cpu.3.%desc":"ACPI CPU","dev.cpu.2.temperature":"50.0C","dev.cpu.2.coretemp.throttle_log":"0","dev.cpu.2.coretemp.tjmax":"105.0C","dev.cpu.2.coretemp.resolution":"1","dev.cpu.2.coretemp.delta":"55","dev.cpu.2.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.2.cx_usage_counters":"20624333 0 0","dev.cpu.2.cx_usage":"100.00% 0.00% 0.00% last 210us","dev.cpu.2.cx_lowest":"C1","dev.cpu.2.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/253","dev.cpu.2.freq_levels":"2001\/10000 2000\/10000 1800\/8793 1600\/7632 1400\/6524 1200\/5466 1000\/4445 800\/3472","dev.cpu.2.freq":"2001","dev.cpu.2.%parent":"acpi0","dev.cpu.2.%pnpinfo":"_HID=ACPI0007 _UID=2 _CID=none","dev.cpu.2.%location":"handle=\\_SB_.CP02","dev.cpu.2.%driver":"cpu","dev.cpu.2.%desc":"ACPI CPU","dev.cpu.1.temperature":"50.0C","dev.cpu.1.coretemp.throttle_log":"0","dev.cpu.1.coretemp.tjmax":"105.0C","dev.cpu.1.coretemp.resolution":"1","dev.cpu.1.coretemp.delta":"55","dev.cpu.1.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.1.cx_usage_counters":"19203038 0 0","dev.cpu.1.cx_usage":"100.00% 0.00% 0.00% last 13948us","dev.cpu.1.cx_lowest":"C1","dev.cpu.1.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/253","dev.cpu.1.freq_levels":"2001\/10000 2000\/10000 1800\/8793 1600\/7632 1400\/6524 1200\/5466 1000\/4445 800\/3472","dev.cpu.1.freq":"2001","dev.cpu.1.%parent":"acpi0","dev.cpu.1.%pnpinfo":"_HID=ACPI0007 _UID=1 _CID=none","dev.cpu.1.%location":"handle=\\_SB_.CP01","dev.cpu.1.%driver":"cpu","dev.cpu.1.%desc":"ACPI CPU","dev.cpu.0.temperature":"50.0C","dev.cpu.0.coretemp.throttle_log":"0","dev.cpu.0.coretemp.tjmax":"105.0C","dev.cpu.0.coretemp.resolution":"1","dev.cpu.0.coretemp.delta":"54","dev.cpu.0.cx_method":"C1\/mwait\/hwc C2\/mwait\/hwc C3\/mwait\/hwc","dev.cpu.0.cx_usage_counters":"169953345 0 0","dev.cpu.0.cx_usage":"100.00% 0.00% 0.00% last 76us","dev.cpu.0.cx_lowest":"C1","dev.cpu.0.cx_supported":"C1\/1\/1 C2\/2\/127 C3\/3\/253","dev.cpu.0.freq_levels":"2001\/10000 2000\/10000 1800\/8793 1600\/7632 1400\/6524 1200\/5466 1000\/4445 800\/3472","dev.cpu.0.freq":"2001","dev.cpu.0.%parent":"acpi0","dev.cpu.0.%pnpinfo":"_HID=ACPI0007 _UID=0 _CID=none","dev.cpu.0.%location":"handle=\\_SB_.CP00","dev.cpu.0.%driver":"cpu","dev.cpu.0.%desc":"ACPI CPU","dev.cpu.%parent":""}

@OPNenthu in contrast, your numbers look like they have higher standard deviation which makes them less trustworthy in the first place? We were discussing averaging the temperatures before which is a lot of effort for ironing out specific hardware design issues. I simply hope you understand why we circle back and forth here.

Right. To quote Yoda: "Temperature resistance is strong with OPNenthu."

If temperature jumps of 15°C can occur within what seems to be just milliseconds, I would argue that the thermal transfer from die to heatsink must be sub-optimal - and that's a euphimism.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Besides the jump the numbers are odd:

> root@firewall:~ # sysctl dev.cpu.0.temperature
> dev.cpu.0.temperature: 50.0C

fastest call, second largest temp

> root@firewall:~ # sysctl -a | fgrep dev.cpu.0.temperature
> dev.cpu.0.temperature: 53.0C

slowest call, largest temp, but only 3 degree more than fastest call?

> root@firewall:~ # sysctl dev.cpu | fgrep dev.cpu.0.temperature
> dev.cpu.0.temperature: 45.0C

Somewhat fast, lowest temp

> # configctl system sysctl values dev.cpu
> [...]"dev.cpu.0.temperature":"50.0C"[...]

Somewhat slower, second highest temp

Maybe the test is flawed anyway because you cannot easily see which CPU served the temps and asked the kernel for the counters...

When the temperature gradients are that steep, you cannot discriminate between the load induced be the test itself and anything that happens in the background... there is no inertia.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Got it.  Seems I have an outlier.

I don't discount the possibility that my ambient is not ideal (this room fluctuates a lot and can get warm), but what you guys are saying is that it doesn't matter.  The transfer from die to sink is the problem here, not sink to air.

Protectli suggested me to buy a USB fan but I'm not happy with that because the fanless operation was the main selling point (and also coreboot UEFI).  Before I go asking for a replacement, I have 2 questions:

1) Are others with the same/similar device seeing the same issue?

2) If Protectli insists that they are seeing the issue across all their devices with OPNsense only, does this imply a problem with their manufacturing process overall?

I may need a separate thread to not pollute the OP's topic.

> The transfer from die to sink is the problem here, not sink to air.

I thought that was clear from the fact we measure die temperatures after all?


Cheers,
Franco

Quote from: franco on December 04, 2024, 09:19:19 PM
> The transfer from die to sink is the problem here, not sink to air.

I thought that was clear from the fact we measure die temperatures after all?

Yes, but I also thought it was clear that those are influenced by both measurement (this topic) and heat dissipation (my particular issue).

I'm only asking if there is enough information here to be certain that my particular issue is not due to ambient conditions but to a manufacturing defect.  But nevermind, I think it's opening a can of worms that distract from the main topic.  If anyone can help me sort this out please PM me, so that I can confidently raise a support request to Protectli.  I'm not yet convinced that I need to. Thank you.

December 05, 2024, 08:07:51 AM #42 Last Edit: December 05, 2024, 08:17:25 AM by OPNenthu
I might have found the culprit, or at least a contributor.

The Protectli V1410 comes with eMMC onboard but it also has an NVME storage option.  I installed a Kingston NV2 256GB into it as this drive is sold/recommended by Protectli also, and they provided thermal pads for me to install the SSD.

I don't think I can measure the eMMC temperature but the NVME is visible and it's rather warm in this system.  Maybe it's contributing to the deviations observed by @franco and @meyergru from my earlier numbers? 

I'm thinking that Protectli quality control would be catching issues of poor thermal interface/paste, and they do claim to bench test each device before shipping.  It's part of what you get purchasing from them rather than AliExpress... and I'm sure the same goes for Deciso in Europe.

Maybe the SSD is taking some of the heatsink capacity that would otherwise go to the CPU.

Linux 'lm_sensors' output (NVME top, CPU bottom):


[root@localhost-live ~]# sensors
nvme-pci-0300
Adapter: PCI adapter
Composite:    +44.9°C  (low  =  -0.1°C, high = +76.8°C)
                       (crit = +78.8°C)
ERROR: Can't get value of subfeature temp3_min: I/O error
ERROR: Can't get value of subfeature temp3_max: I/O error
Sensor 2:     +51.9°C  (low  =  +0.0°C, high =  +0.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +48.0°C  (high = +105.0°C, crit = +105.0°C)
Core 0:        +40.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:        +40.0°C  (high = +105.0°C, crit = +105.0°C)
Core 2:        +40.0°C  (high = +105.0°C, crit = +105.0°C)
Core 3:        +40.0°C  (high = +105.0°C, crit = +105.0°C)


FreeBSD 'nvmecontrol' (temperature sensor 2 near bottom):


root@firewall:/dev # nvmecontrol logpage -p 2 nvme0
SMART/Health Information Log
============================
Critical Warning State:         0x00
Available spare:               0
Temperature:                   0
Device reliability:            0
Read only:                     0
Volatile memory backup:        0
Temperature:                    318 K, 44.85 C, 112.73 F
Available spare:                100
Available spare threshold:      10
Percentage used:                0
Data units (512,000 byte) read: 94993
Data units written:             1627689
Host read commands:             3113795
Host write commands:            24545023
Controller busy time (minutes): 218
Power cycles:                   41
Power on hours:                 1361
Unsafe shutdowns:               10
Media errors:                   0
No. error info log entries:     20
Warning Temp Composite Time:    0
Error Temp Composite Time:      0
Temperature Sensor 2:           325 K, 51.85 C, 125.33 F
Temperature 1 Transition Count: 0
Temperature 2 Transition Count: 0
Total Time For Temperature 1:   0
Total Time For Temperature 2:   0


Comparing the 'lm_sensors' and 'sysctl dev.cpu' outputs from earlier, I feel somewhat confident that my real idle core temperature is in the 40-45C range.  It may become even less if I were to remove the NVME drive, but I'm not sure there is a real need.  I think these temperatures are still within spec-- but I'd appreciate some confirmation from others.

Coming back to the topic, I vote yes for 'sysctl dev.cpu' IF there is a way to implement it without sacrificing sensors that others depend on.  This method seems to make a difference even for me.

I vote no on the proposal for temperature averaging.  That would just give me a skewed average I guess.

"It does not work like that":

The connection between the CPU die and the NVME die is through two thermal bridges connecting both to the case at best, while I have seen cases where the NVME did not even touch it.

So, this connection is even more remote than the connection between the CPU die and most of the thermal "mass" of the case, so to speak. The fluctuations that you see suggest a bad transition between the CPU die and the case, so, if at all, the NVME heat only contributes an offset, not a fast fluctuation. Mind you, the temperature is measures directly on the CPU die.

So, sure, additional wattage by the NVME makes up for a higher general temperature level, but not for fast changes in CPU temps.

I will keep my mouth shut about Protecli, but I think they are not even comparable to Deciso.

Quote from: OPNenthu on December 05, 2024, 08:07:51 AM
Coming back to the topic, I vote yes for 'sysctl dev.cpu' IF there is a way to implement it without sacrificing sensors that others depend on.  This method seems to make a difference even for me.

Reducing output to "dev.cpu" in order to keep processing cost (and heat builtup) during measurement automatically limits it to that sysctl subtree, excluding outliers. So your IF can only be satisfied if there was a way to specify a list of other prefixes in order to include exotics.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Quote from: meyergru on December 05, 2024, 10:01:51 AM
"It does not work like that":

The connection between the CPU die and the NVME die is through two thermal bridges connecting both to the case at best, while I have seen cases where the NVME did not even touch it.

So, this connection is even more remote than the connection between the CPU die and most of the thermal "mass" of the case, so to speak. The fluctuations that you see suggest a bad transition between the CPU die and the case, so, if at all, the NVME heat only contributes an offset, not a fast fluctuation. Mind you, the temperature is measures directly on the CPU die.

So, sure, additional wattage by the NVME makes up for a higher general temperature level, but not for fast changes in CPU temps.

Good point here. 

I'm noticing though that I'm not the only person with an N5105 that is complaining about high temps.  There's another one in the 'Hardware & Performance' section, and another on Amazon reviews. 

How can I be sure that I am not an outlier relative to all the other N5105 users?  If I can show that my system is behaving erratically relative to all the others in its category then I think I could make a case for a replacement.

Maybe we need a 5105 owners thread to see if anyone else is having similar quick temperature transitions.


Quote
I will keep my mouth shut about Protecli, but I think they are not even comparable to Deciso.
Of course, not even in the same market or price segment.  One doesn't even sell rackmount gear.

Both companies have quality controls and stand behind with warranties and support, though.  That's my point.  You take a gamble with the AliExpress specials. :)

I'll say no more.  I'm a fan of both and would buy a Deciso if my budget allowed.

Quote
Reducing output to "dev.cpu" in order to keep processing cost (and heat builtup) during measurement automatically limits it to that sysctl subtree, excluding outliers. So your IF can only be satisfied if there was a way to specify a list of other prefixes in order to include exotics.

There was a proposal on GitHub to save the list of sensors from 'sysctl -a' at startup, as a one-time call.  From then on it would be possible to call 'sysctl' on them periodically.  It was not well received.