CPU temp reporting

Started by Dimi3, October 01, 2023, 09:39:00 AM

Previous topic - Next topic
Hello,

Im testing new Intel N200 box, and came across interesting issue with CPU temperature reporting. My fanless box is practically cool to touch, but CPU temp in OPNSense is reporting 45 degrees, that it seemd to me too much. So I did some digging.

If I log in console and issue, I got more realistic CPU temp with command;

sysctl dev.cpu | grep temperature
dev.cpu.3.temperature: 35.0C
dev.cpu.2.temperature: 35.0C
dev.cpu.1.temperature: 36.0C
dev.cpu.0.temperature: 35.0C

I checked what command is OPNSense using for temp sensors and its;

sysctl -a | grep temperature
hw.acpi.thermal.tz0.temperature: 27.9C
dev.cpu.3.temperature: 45.0C
dev.cpu.2.temperature: 45.0C
dev.cpu.1.temperature: 44.0C
dev.cpu.0.temperature: 43.0C

If I stress test the machine this 2 temps come together at 55 degrees after 30 min witch is ok.

I installed HTOP and check the temp there, the lower values are reported, with occasional spike above 40 but average at 36 - 37 degrees.

I live booted linux on the same machine and checked linux temp sensors .. they are also reporting average of 36 - 38 degrees.

Out of curiosity I also spin up pfsense, and temp was also ok .. no spikes above 40 degrees. - but pfsense is using different command to read cpu temp then opnsense.

Overall this is not life threatening situation 😊 😊, but only my observation I decided to post it here. According to my testing both temps are correct higher temp is actually achieved by CPU when short load hits it, but it lasts less then a second and it falls to 36 degrees. It looks to me that the command sysctl -a | grep temperature is displaying max values in certain period, averaging in higher temp in dashboard that it really is.


I also tested this on my other firewall running Intel core I5 and there the difference is max 2-3 degrees Celsius, that its not really noticeable.

I also searched the forum, and few similar post have already been posted;

https://forum.opnsense.org/index.php?topic=34395.msg166556#msg166556

I think that it would be good to revise the temp monitoring and maybe use sysctl dev.cpu | grep temperature as it reports more "real" CPU temp.

Regards,

Bumping this old thread because I see the same behavior!

you're querying the same values, but one method parses MUCH less data than the other...


Here is the dev.cpu method...
```
root@router-02:~ # sysctl dev.cpu | wc -l
     273
```

Looking for temp by looking at only needs to export and grep through 273 lines.



Here is the sysctl -a method...
```
root@router-02:~ # sysctl -a | wc -l
   16497
```

The sysctl has to export 16000+ more lines than the one that only looks at cpu values, then has to grep through those 16k values to find the ones that match the grep.

The 2nd command finishes quickly, but it still causes enough additional cpu load to show the temp has been raised by the time temperature is filtered out by the grep command.




Are you saying that a simple grep execution on of only 16k records gives a CPU load high enough to raise the temperature of a not so weak CPU by +10 degrees? Really?

Yes, it does. By reducing the output to a few lines, the CPU footprint can be reduced.

See: https://github.com/opnsense/core/pull/8090
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

But the patch wasn't accepted. Your commit is closed without merging with the main code branch. ???

I know. That is because you cannot limit the sysctl output without risking to skip temperature sensors that lie outside of dev.cpu and hw.acpi subtrees.

It is what it is, I tried... :-(
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Can you make an installable patch with these changes? On my platform there are no such sensors and I am able to sacrifice unnecessary code for a proper display of the temperature and not as it is now +10..+12 degrees.

It is just one changed line in /usr/local/opnsense/scripts/system/temperature.sh like so:

https://github.com/opnsense/core/pull/8090/commits/de7e48ab1e1c339fdae78eaf95b54b09d6e1ac44
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Works for my box, I prefer it @Meyergru's way.

I have a i5-10500 and I have two CPU0's displayed and judging by the temps, the second CPU0 was the Zone.  A minor gripe but with dev.cpu I just have CPU0 - CPU11 and I like it better like that.
AhnHEL (Angel)

I wonder what second CPU0 you had? Could you show the output of:


sysctl -e `sysctl -aN | fgrep temperature` | sort

vs.

sysctl -e `sysctl -N dev.cpu hw.acpi.thermal | fgrep temperature` | sort


Substantially, there should be no difference in the number of lines, unless there is a temperature sensor outside of dev.cpu/hw.acpi.thermal on your machine. I have never seen that...
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

One of my Supermicro systems has got a temperature sensor in/next to the 10G network interfaces. It's visible in SNMP and IPMI. Unfortunately that system runs Proxmox at the moment so I cannot check if and where that sensor will end up as a FreeBSD sysctl OID.

Anyone running an X10SDV-4C-TLN4F with OPNsense or FreeBSD?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

November 27, 2024, 05:06:35 PM #12 Last Edit: November 27, 2024, 05:24:48 PM by AhnHEL
Quote from: meyergru on November 27, 2024, 10:22:52 AM
I wonder what second CPU0 you had? Could you show the output of:


sysctl -e `sysctl -aN | fgrep temperature` | sort


root@angel:~ # sysctl -e `sysctl -aN | fgrep temperature` | sort
dev.cpu.0.temperature=32.0C
dev.cpu.1.temperature=32.0C
dev.cpu.10.temperature=33.0C
dev.cpu.11.temperature=33.0C
dev.cpu.2.temperature=40.0C
dev.cpu.3.temperature=40.0C
dev.cpu.4.temperature=34.0C
dev.cpu.5.temperature=34.0C
dev.cpu.6.temperature=32.0C
dev.cpu.7.temperature=32.0C
dev.cpu.8.temperature=39.0C
dev.cpu.9.temperature=39.0C
dev.pchtherm.0.temperature=35.0C

vs.

sysctl -e `sysctl -N dev.cpu hw.acpi.thermal | fgrep temperature` | sort

sysctl: unknown oid 'hw.acpi.thermal'
dev.cpu.0.temperature=32.0C
dev.cpu.1.temperature=32.0C
dev.cpu.10.temperature=32.0C
dev.cpu.11.temperature=32.0C
dev.cpu.2.temperature=30.0C
dev.cpu.3.temperature=30.0C
dev.cpu.4.temperature=33.0C
dev.cpu.5.temperature=33.0C
dev.cpu.6.temperature=31.0C
dev.cpu.7.temperature=31.0C
dev.cpu.8.temperature=32.0C
dev.cpu.9.temperature=32.0C

AhnHEL (Angel)

Here's my current approach to list all sysctls that register as temperatures ("IK") in the kernel:

# sysctl -aF | awk -F ": " '$2 ~ "^IK" { print $1 }' | grep -v "\._" | sort

This would include fringe hardware and NIC sensors if the drivers support it. Also threshold temperatures that can be set for monitoring.


Cheers,
Franco

I see two quick wins:

1. Separating out the step to find the correct lines from actually querying the sensors. This would need a persisted file of the first sysctl output. It could be done by a startup job (like @reboot in cron) or by guarding the sysctl call with a file timestamp test to only look for sensors once in a while and then sending the sysctl output to that file.

Both ways, on each readout call, only a few sysctl values are queried from the sensor list file.

2. Using fgrep instead of grep should be faster because it is a simple string compare instead of a regex match.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A