Hello,
Im testing new Intel N200 box, and came across interesting issue with CPU temperature reporting. My fanless box is practically cool to touch, but CPU temp in OPNSense is reporting 45 degrees, that it seemd to me too much. So I did some digging.
If I log in console and issue, I got more realistic CPU temp with command;
sysctl dev.cpu | grep temperature
dev.cpu.3.temperature: 35.0C
dev.cpu.2.temperature: 35.0C
dev.cpu.1.temperature: 36.0C
dev.cpu.0.temperature: 35.0C
I checked what command is OPNSense using for temp sensors and its;
sysctl -a | grep temperature
hw.acpi.thermal.tz0.temperature: 27.9C
dev.cpu.3.temperature: 45.0C
dev.cpu.2.temperature: 45.0C
dev.cpu.1.temperature: 44.0C
dev.cpu.0.temperature: 43.0C
If I stress test the machine this 2 temps come together at 55 degrees after 30 min witch is ok.
I installed HTOP and check the temp there, the lower values are reported, with occasional spike above 40 but average at 36 - 37 degrees.
I live booted linux on the same machine and checked linux temp sensors .. they are also reporting average of 36 - 38 degrees.
Out of curiosity I also spin up pfsense, and temp was also ok .. no spikes above 40 degrees. - but pfsense is using different command to read cpu temp then opnsense.
Overall this is not life threatening situation 😊 😊, but only my observation I decided to post it here. According to my testing both temps are correct higher temp is actually achieved by CPU when short load hits it, but it lasts less then a second and it falls to 36 degrees. It looks to me that the command sysctl -a | grep temperature is displaying max values in certain period, averaging in higher temp in dashboard that it really is.
I also tested this on my other firewall running Intel core I5 and there the difference is max 2-3 degrees Celsius, that its not really noticeable.
I also searched the forum, and few similar post have already been posted;
https://forum.opnsense.org/index.php?topic=34395.msg166556#msg166556
I think that it would be good to revise the temp monitoring and maybe use sysctl dev.cpu | grep temperature as it reports more "real" CPU temp.
Regards,
Bumping this old thread because I see the same behavior!
you're querying the same values, but one method parses MUCH less data than the other...
Here is the dev.cpu method...
```
root@router-02:~ # sysctl dev.cpu | wc -l
273
```
Looking for temp by looking at only needs to export and grep through 273 lines.
Here is the sysctl -a method...
```
root@router-02:~ # sysctl -a | wc -l
16497
```
The sysctl has to export 16000+ more lines than the one that only looks at cpu values, then has to grep through those 16k values to find the ones that match the grep.
The 2nd command finishes quickly, but it still causes enough additional cpu load to show the temp has been raised by the time temperature is filtered out by the grep command.
Are you saying that a simple grep execution on of only 16k records gives a CPU load high enough to raise the temperature of a not so weak CPU by +10 degrees? Really?
Yes, it does. By reducing the output to a few lines, the CPU footprint can be reduced.
See: https://github.com/opnsense/core/pull/8090
But the patch wasn't accepted. Your commit is closed without merging with the main code branch. ???
I know. That is because you cannot limit the sysctl output without risking to skip temperature sensors that lie outside of dev.cpu and hw.acpi subtrees.
It is what it is, I tried... :-(
Can you make an installable patch with these changes? On my platform there are no such sensors and I am able to sacrifice unnecessary code for a proper display of the temperature and not as it is now +10..+12 degrees.
It is just one changed line in /usr/local/opnsense/scripts/system/temperature.sh like so:
https://github.com/opnsense/core/pull/8090/commits/de7e48ab1e1c339fdae78eaf95b54b09d6e1ac44
Works for my box, I prefer it @Meyergru's way.
I have a i5-10500 and I have two CPU0's displayed and judging by the temps, the second CPU0 was the Zone. A minor gripe but with dev.cpu I just have CPU0 - CPU11 and I like it better like that.
I wonder what second CPU0 you had? Could you show the output of:
sysctl -e `sysctl -aN | fgrep temperature` | sort
vs.
sysctl -e `sysctl -N dev.cpu hw.acpi.thermal | fgrep temperature` | sort
Substantially, there should be no difference in the number of lines, unless there is a temperature sensor outside of dev.cpu/hw.acpi.thermal on your machine. I have never seen that...
One of my Supermicro systems has got a temperature sensor in/next to the 10G network interfaces. It's visible in SNMP and IPMI. Unfortunately that system runs Proxmox at the moment so I cannot check if and where that sensor will end up as a FreeBSD sysctl OID.
Anyone running an X10SDV-4C-TLN4F with OPNsense or FreeBSD?
Quote from: meyergru on November 27, 2024, 10:22:52 AM
I wonder what second CPU0 you had? Could you show the output of:
sysctl -e `sysctl -aN | fgrep temperature` | sort
root@angel:~ # sysctl -e `sysctl -aN | fgrep temperature` | sort
dev.cpu.0.temperature=32.0C
dev.cpu.1.temperature=32.0C
dev.cpu.10.temperature=33.0C
dev.cpu.11.temperature=33.0C
dev.cpu.2.temperature=40.0C
dev.cpu.3.temperature=40.0C
dev.cpu.4.temperature=34.0C
dev.cpu.5.temperature=34.0C
dev.cpu.6.temperature=32.0C
dev.cpu.7.temperature=32.0C
dev.cpu.8.temperature=39.0C
dev.cpu.9.temperature=39.0C
dev.pchtherm.0.temperature=35.0C
vs.
sysctl -e `sysctl -N dev.cpu hw.acpi.thermal | fgrep temperature` | sort
sysctl: unknown oid 'hw.acpi.thermal'
dev.cpu.0.temperature=32.0C
dev.cpu.1.temperature=32.0C
dev.cpu.10.temperature=32.0C
dev.cpu.11.temperature=32.0C
dev.cpu.2.temperature=30.0C
dev.cpu.3.temperature=30.0C
dev.cpu.4.temperature=33.0C
dev.cpu.5.temperature=33.0C
dev.cpu.6.temperature=31.0C
dev.cpu.7.temperature=31.0C
dev.cpu.8.temperature=32.0C
dev.cpu.9.temperature=32.0C
Here's my current approach to list all sysctls that register as temperatures ("IK") in the kernel:
# sysctl -aF | awk -F ": " '$2 ~ "^IK" { print $1 }' | grep -v "\._" | sort
This would include fringe hardware and NIC sensors if the drivers support it. Also threshold temperatures that can be set for monitoring.
Cheers,
Franco
I see two quick wins:
1. Separating out the step to find the correct lines from actually querying the sensors. This would need a persisted file of the first sysctl output. It could be done by a startup job (like @reboot in cron) or by guarding the sysctl call with a file timestamp test to only look for sensors once in a while and then sending the sysctl output to that file.
Both ways, on each readout call, only a few sysctl values are queried from the sensor list file.
2. Using fgrep instead of grep should be faster because it is a simple string compare instead of a regex match.
@meyergru It looks like that second CPU0 belongs to dev.pchtherm.0 but then why does the formatting in the widget express it as CPU0?
I merged this now since I haven't seen a better approach yet:
https://github.com/opnsense/core/commit/eded37411f
It would be nice if configd had a "prefetch" and "serve expired" type of metric but in the average case this is more than enough. A suboptimal workaround can be discussed, but it will always look a bit strange in my opinion (running random actions at boot to fix an edge case on certain hardware).
Cheers,
Franco
@franco I applied the patch, can it omit the tjmax graph outputs? Its not easy on the eyes. I'm still having issue with the pchtherm.0 section that shows up as another CPU0.
Can you dump your
# configctl system sensors
here?
The 100 degree ones look like thresholds from Intel CPU maybe. I don't have the hardware to verify.
The widget is another issue as the patch only deals with the backend. It would be nicer if the widget would not do too much magic here guestimating where the reading comes from and making similar readings collapsible maybe.
But we can get there step by step. Thanks for testing!
Cheers,
Franco
Thank you, yes.
# configctl system sensors
dev.cpu.0.coretemp.tjmax
dev.cpu.0.temperature
dev.cpu.1.coretemp.tjmax
dev.cpu.1.temperature
dev.cpu.10.coretemp.tjmax
dev.cpu.10.temperature
dev.cpu.11.coretemp.tjmax
dev.cpu.11.temperature
dev.cpu.2.coretemp.tjmax
dev.cpu.2.temperature
dev.cpu.3.coretemp.tjmax
dev.cpu.3.temperature
dev.cpu.4.coretemp.tjmax
dev.cpu.4.temperature
dev.cpu.5.coretemp.tjmax
dev.cpu.5.temperature
dev.cpu.6.coretemp.tjmax
dev.cpu.6.temperature
dev.cpu.7.coretemp.tjmax
dev.cpu.7.temperature
dev.cpu.8.coretemp.tjmax
dev.cpu.8.temperature
dev.cpu.9.coretemp.tjmax
dev.cpu.9.temperature
dev.pchtherm.0.ctt
dev.pchtherm.0.pmtemp
dev.pchtherm.0.t0temp
dev.pchtherm.0.t1temp
dev.pchtherm.0.t2temp
dev.pchtherm.0.temperature
Thanks I added this one to ignore the coretemp threshold:
https://github.com/opnsense/core/commit/d314680
What I really dislike is FreeBSD kernel faking the CPU temperatures for amdtemp which isn't even per core, but was made this way to match the coretemp behaviour (which is also quite bulky with lots of cores).
pchtherm(4) is also annoying with only one reading and the rest being thresholds which are hard to separate from each other.
Cheers,
Franco
The 'pchtherm0.temperature' is the only relevant reading so the others should just be omitted/ignored for the widget. Then again if the dev.cpu readings are available, do we even need the pchtherm at all for the widget?
Yes, the whole point was being inclusive of unknown sensors that may be useful and as you can see it's a wild west of interesting and not so interesting metrics in incoherent form.
I think we should make a default filter in the widget and go from there. Users can then adjust if needed. Adding too much glue in the backend is probably not a good approach (and it already filters a bit much for technical reasons, oh well).
Cheers,
Franco