Print Page - CPU temp reporting

Title: CPU temp reporting
Post by: Dimi3 on October 01, 2023, 09:39:00 AM

Hello,

Im testing new Intel N200 box, and came across interesting issue with CPU temperature reporting. My fanless box is practically cool to touch, but CPU temp in OPNSense is reporting 45 degrees, that it seemd to me too much. So I did some digging.

If I log in console and issue, I got more realistic CPU temp with command;

sysctl dev.cpu | grep temperature
dev.cpu.3.temperature: 35.0C
dev.cpu.2.temperature: 35.0C
dev.cpu.1.temperature: 36.0C
dev.cpu.0.temperature: 35.0C

I checked what command is OPNSense using for temp sensors and its;

sysctl -a | grep temperature
hw.acpi.thermal.tz0.temperature: 27.9C
dev.cpu.3.temperature: 45.0C
dev.cpu.2.temperature: 45.0C
dev.cpu.1.temperature: 44.0C
dev.cpu.0.temperature: 43.0C

If I stress test the machine this 2 temps come together at 55 degrees after 30 min witch is ok.

I installed HTOP and check the temp there, the lower values are reported, with occasional spike above 40 but average at 36 - 37 degrees.

I live booted linux on the same machine and checked linux temp sensors .. they are also reporting average of 36 - 38 degrees.

Out of curiosity I also spin up pfsense, and temp was also ok .. no spikes above 40 degrees. - but pfsense is using different command to read cpu temp then opnsense.

Overall this is not life threatening situation 😊 😊, but only my observation I decided to post it here. According to my testing both temps are correct higher temp is actually achieved by CPU when short load hits it, but it lasts less then a second and it falls to 36 degrees. It looks to me that the command sysctl -a | grep temperature is displaying max values in certain period, averaging in higher temp in dashboard that it really is.

I also tested this on my other firewall running Intel core I5 and there the difference is max 2-3 degrees Celsius, that its not really noticeable.

I also searched the forum, and few similar post have already been posted;

https://forum.opnsense.org/index.php?topic=34395.msg166556#msg166556

I think that it would be good to revise the temp monitoring and maybe use sysctl dev.cpu | grep temperature as it reports more "real" CPU temp.

Regards,

Title: Re: CPU temp reporting
Post by: vicking on September 19, 2024, 11:53:40 AM

Bumping this old thread because I see the same behavior!

Title: Re: CPU temp reporting
Post by: rickyricky on November 24, 2024, 10:48:46 PM

you're querying the same values, but one method parses MUCH less data than the other...

Here is the dev.cpu method...
```
root@router-02:~ # sysctl dev.cpu | wc -l
273
```

Looking for temp by looking at only needs to export and grep through 273 lines.

Here is the sysctl -a method...
```
root@router-02:~ # sysctl -a | wc -l
16497
```

The sysctl has to export 16000+ more lines than the one that only looks at cpu values, then has to grep through those 16k values to find the ones that match the grep.

The 2nd command finishes quickly, but it still causes enough additional cpu load to show the temp has been raised by the time temperature is filtered out by the grep command.

Title: Re: CPU temp reporting
Post by: _tribal_ on November 26, 2024, 11:10:49 AM

Are you saying that a simple grep execution on of only 16k records gives a CPU load high enough to raise the temperature of a not so weak CPU by +10 degrees? Really?

Title: Re: CPU temp reporting
Post by: meyergru on November 26, 2024, 12:05:18 PM

Yes, it does. By reducing the output to a few lines, the CPU footprint can be reduced.

See: https://github.com/opnsense/core/pull/8090

Title: Re: CPU temp reporting
Post by: _tribal_ on November 26, 2024, 07:28:36 PM

But the patch wasn't accepted. Your commit is closed without merging with the main code branch. ???

Title: Re: CPU temp reporting
Post by: meyergru on November 26, 2024, 07:32:08 PM

I know. That is because you cannot limit the sysctl output without risking to skip temperature sensors that lie outside of dev.cpu and hw.acpi subtrees.

It is what it is, I tried... :-(

Title: Re: CPU temp reporting
Post by: _tribal_ on November 26, 2024, 10:02:50 PM

Can you make an installable patch with these changes? On my platform there are no such sensors and I am able to sacrifice unnecessary code for a proper display of the temperature and not as it is now +10..+12 degrees.

Title: Re: CPU temp reporting
Post by: meyergru on November 26, 2024, 10:08:47 PM

It is just one changed line in /usr/local/opnsense/scripts/system/temperature.sh like so:

https://github.com/opnsense/core/pull/8090/commits/de7e48ab1e1c339fdae78eaf95b54b09d6e1ac44

Title: Re: CPU temp reporting
Post by: AhnHEL on November 27, 2024, 06:47:28 AM

Works for my box, I prefer it @Meyergru's way.

I have a i5-10500 and I have two CPU0's displayed and judging by the temps, the second CPU0 was the Zone. A minor gripe but with dev.cpu I just have CPU0 - CPU11 and I like it better like that.

Title: Re: CPU temp reporting
Post by: meyergru on November 27, 2024, 10:22:52 AM

I wonder what second CPU0 you had? Could you show the output of:

Code Select


sysctl -e `sysctl -aN | fgrep temperature` | sort

vs.

sysctl -e `sysctl -N dev.cpu hw.acpi.thermal | fgrep temperature` | sort

Substantially, there should be no difference in the number of lines, unless there is a temperature sensor outside of dev.cpu/hw.acpi.thermal on your machine. I have never seen that...

Title: Re: CPU temp reporting
Post by: Patrick M. Hausen on November 27, 2024, 11:15:27 AM

One of my Supermicro systems has got a temperature sensor in/next to the 10G network interfaces. It's visible in SNMP and IPMI. Unfortunately that system runs Proxmox at the moment so I cannot check if and where that sensor will end up as a FreeBSD sysctl OID.

Anyone running an X10SDV-4C-TLN4F with OPNsense or FreeBSD?

Title: Re: CPU temp reporting
Post by: AhnHEL on November 27, 2024, 05:06:35 PM

Quote from: meyergru on November 27, 2024, 10:22:52 AM
I wonder what second CPU0 you had? Could you show the output of:

Code Select


sysctl -e `sysctl -aN | fgrep temperature` | sort


root@angel:~ # sysctl -e `sysctl -aN | fgrep temperature` | sort
dev.cpu.0.temperature=32.0C
dev.cpu.1.temperature=32.0C
dev.cpu.10.temperature=33.0C
dev.cpu.11.temperature=33.0C
dev.cpu.2.temperature=40.0C
dev.cpu.3.temperature=40.0C
dev.cpu.4.temperature=34.0C
dev.cpu.5.temperature=34.0C
dev.cpu.6.temperature=32.0C
dev.cpu.7.temperature=32.0C
dev.cpu.8.temperature=39.0C
dev.cpu.9.temperature=39.0C
dev.pchtherm.0.temperature=35.0C

vs.

sysctl -e `sysctl -N dev.cpu hw.acpi.thermal | fgrep temperature` | sort

sysctl: unknown oid 'hw.acpi.thermal'
dev.cpu.0.temperature=32.0C
dev.cpu.1.temperature=32.0C
dev.cpu.10.temperature=32.0C
dev.cpu.11.temperature=32.0C
dev.cpu.2.temperature=30.0C
dev.cpu.3.temperature=30.0C
dev.cpu.4.temperature=33.0C
dev.cpu.5.temperature=33.0C
dev.cpu.6.temperature=31.0C
dev.cpu.7.temperature=31.0C
dev.cpu.8.temperature=32.0C
dev.cpu.9.temperature=32.0C

Title: Re: CPU temp reporting
Post by: franco on November 27, 2024, 05:14:39 PM

Here's my current approach to list all sysctls that register as temperatures ("IK") in the kernel:

# sysctl -aF | awk -F ": " '$2 ~ "^IK" { print $1 }' | grep -v "\._" | sort

This would include fringe hardware and NIC sensors if the drivers support it. Also threshold temperatures that can be set for monitoring.

Cheers,
Franco

Title: Re: CPU temp reporting
Post by: meyergru on November 27, 2024, 05:59:21 PM

I see two quick wins:

1. Separating out the step to find the correct lines from actually querying the sensors. This would need a persisted file of the first sysctl output. It could be done by a startup job (like @reboot in cron) or by guarding the sysctl call with a file timestamp test to only look for sensors once in a while and then sending the sysctl output to that file.

Both ways, on each readout call, only a few sysctl values are queried from the sensor list file.

2. Using fgrep instead of grep should be faster because it is a simple string compare instead of a regex match.

Title: Re: CPU temp reporting
Post by: AhnHEL on November 27, 2024, 06:25:51 PM

@meyergru It looks like that second CPU0 belongs to dev.pchtherm.0 but then why does the formatting in the widget express it as CPU0?

Title: Re: CPU temp reporting
Post by: franco on November 27, 2024, 06:27:08 PM

I merged this now since I haven't seen a better approach yet:

https://github.com/opnsense/core/commit/eded37411f

It would be nice if configd had a "prefetch" and "serve expired" type of metric but in the average case this is more than enough. A suboptimal workaround can be discussed, but it will always look a bit strange in my opinion (running random actions at boot to fix an edge case on certain hardware).

Cheers,
Franco

Title: Re: CPU temp reporting
Post by: AhnHEL on November 27, 2024, 07:12:04 PM

@franco I applied the patch, can it omit the tjmax graph outputs? Its not easy on the eyes. I'm still having issue with the pchtherm.0 section that shows up as another CPU0.

Title: Re: CPU temp reporting
Post by: franco on November 27, 2024, 07:41:24 PM

Can you dump your

# configctl system sensors

here?

The 100 degree ones look like thresholds from Intel CPU maybe. I don't have the hardware to verify.

The widget is another issue as the patch only deals with the backend. It would be nicer if the widget would not do too much magic here guestimating where the reading comes from and making similar readings collapsible maybe.

But we can get there step by step. Thanks for testing!

Cheers,
Franco

Title: Re: CPU temp reporting
Post by: AhnHEL on November 27, 2024, 07:50:05 PM

Thank you, yes.

# configctl system sensors
dev.cpu.0.coretemp.tjmax
dev.cpu.0.temperature
dev.cpu.1.coretemp.tjmax
dev.cpu.1.temperature
dev.cpu.10.coretemp.tjmax
dev.cpu.10.temperature
dev.cpu.11.coretemp.tjmax
dev.cpu.11.temperature
dev.cpu.2.coretemp.tjmax
dev.cpu.2.temperature
dev.cpu.3.coretemp.tjmax
dev.cpu.3.temperature
dev.cpu.4.coretemp.tjmax
dev.cpu.4.temperature
dev.cpu.5.coretemp.tjmax
dev.cpu.5.temperature
dev.cpu.6.coretemp.tjmax
dev.cpu.6.temperature
dev.cpu.7.coretemp.tjmax
dev.cpu.7.temperature
dev.cpu.8.coretemp.tjmax
dev.cpu.8.temperature
dev.cpu.9.coretemp.tjmax
dev.cpu.9.temperature
dev.pchtherm.0.ctt
dev.pchtherm.0.pmtemp
dev.pchtherm.0.t0temp
dev.pchtherm.0.t1temp
dev.pchtherm.0.t2temp
dev.pchtherm.0.temperature

Title: Re: CPU temp reporting
Post by: franco on November 28, 2024, 02:08:46 PM

Thanks I added this one to ignore the coretemp threshold:

https://github.com/opnsense/core/commit/d314680

What I really dislike is FreeBSD kernel faking the CPU temperatures for amdtemp which isn't even per core, but was made this way to match the coretemp behaviour (which is also quite bulky with lots of cores).

pchtherm(4) is also annoying with only one reading and the rest being thresholds which are hard to separate from each other.

Cheers,
Franco

Title: Re: CPU temp reporting
Post by: AhnHEL on November 29, 2024, 07:16:24 PM

The 'pchtherm0.temperature' is the only relevant reading so the others should just be omitted/ignored for the widget. Then again if the dev.cpu readings are available, do we even need the pchtherm at all for the widget?

Title: Re: CPU temp reporting
Post by: franco on November 30, 2024, 11:27:24 AM

Yes, the whole point was being inclusive of unknown sensors that may be useful and as you can see it's a wild west of interesting and not so interesting metrics in incoherent form.

I think we should make a default filter in the widget and go from there. Users can then adjust if needed. Adding too much glue in the backend is probably not a good approach (and it already filters a bit much for technical reasons, oh well).

Cheers,
Franco

OPNsense Forum

Archive => 23.7 Legacy Series => Topic started by: Dimi3 on October 01, 2023, 09:39:00 AM