[SOLVED] N100 Box - update from 24.7.1 to .2 results in much higher power draw

Started by imk82, August 26, 2024, 03:50:37 PM

Previous topic - Next topic
Hi all,

has anyone else experienced a higher power consumption on a Intel N100 platform and similar after updating to 24.7.2?

I changed literally nothing else and my power consumption raised from 10-12W up to 30-33W(!).

Best regards
Robert

Have you checked if there is any process that is using a lot of cpu?

N100 can pull around 30W in case it Cores are running loaded. OS update itself can not/should not produce higher power drawn. Something has to be using the CPU or the system itself to do this.

What is your workload?
What is the CPU utilization on each core?
What is the total load of the system?

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

I saw a lot more power being consumed with the IPv6 issues & the health reporting hogging the CPU.
Fixed with the -nd kernel.

Hi all,

thanks for your input. From what I see, there is nothing obvious consuming lot of CPU. My first bet was the updated micro-code package with 24.7.2. But after reverting I see the same high power consumption.

But what seems odd is, that the used C-states seem to be weird compared to the cpu load.

Best regards
Robert


root@jupiter:~ # ps aux
USER   PID  %CPU %MEM   VSZ   RSS TT  STAT STARTED      TIME COMMAND
root    11 400.0  0.0     0    64  -  RNL  15:53   641:15.61 [idle]
root 64829   0.1  0.1 19720  9064  -  S    15:56     0:02.09 sshd-session: root@pts/0 (sshd-session)
root     0   0.0  0.0     0  1472  -  DLs  15:53     6:05.58 [kernel]
root     1   0.0  0.0 11304  1088  -  ILs  15:53     0:00.10 /sbin/init
root     2   0.0  0.0     0    64  -  WL   15:53     0:03.99 [clock]
root     3   0.0  0.0     0    80  -  DL   15:53     0:00.00 [crypto]
root     4   0.0  0.0     0    48  -  DL   15:53     0:00.00 [cam]
root     5   0.0  0.0     0    16  -  DL   15:53     0:00.00 [busdma]
root     6   0.0  0.0     0   896  -  DL   15:53     0:01.50 [zfskern]
root     7   0.0  0.0     0    16  -  DL   15:53     0:07.09 [pf purge]
root     8   0.0  0.0     0    16  -  DL   15:53     0:06.42 [rand_harvestq]
root     9   0.0  0.0     0    48  -  DL   15:53     0:02.84 [pagedaemon]
root    10   0.0  0.0     0    16  -  DL   15:53     0:00.00 [audit]
root    12   0.0  0.0     0   240  -  WL   15:53     0:01.66 [intr]
root    13   0.0  0.0     0    48  -  DL   15:53     0:00.00 [geom]
root    14   0.0  0.0     0    16  -  DL   15:53     0:00.00 [sequencer 00]
root    15   0.0  0.0     0   160  -  DL   15:53     0:00.33 [usb]
root    16   0.0  0.0     0    16  -  DL   15:53     0:00.16 [acpi_thermal]
root    17   0.0  0.0     0    16  -  DL   15:53     0:00.00 [vmdaemon]
root    18   0.0  0.0     0   128  -  DL   15:53     0:00.92 [bufdaemon]
root    19   0.0  0.0     0    16  -  DL   15:53     0:00.12 [vnlru]
root    20   0.0  0.0     0    16  -  DL   15:53     0:00.19 [syncer]
root    32   0.0  0.0     0    16  -  DL   15:53     0:00.00 [aiod1]
root    33   0.0  0.0     0    16  -  DL   15:53     0:00.00 [aiod2]
root    34   0.0  0.0     0    16  -  DL   15:53     0:00.00 [aiod3]
root    35   0.0  0.0     0    16  -  DL   15:53     0:00.00 [aiod4]
...



last pid: 81376;  load averages:  0.35,  0.35,  0.34                                                                                up 0+03:14:02  19:07:14
70 processes:  1 running, 69 sleeping
CPU:  0.6% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.4% idle
Mem: 41M Active, 230M Inact, 492M Wired, 56K Buf, 15G Free
ARC: 165M Total, 51M MFU, 106M MRU, 1252K Header, 6776K Other
     123M Compressed, 279M Uncompressed, 2.27:1 Ratio
Swap: 2048M Total, 2048M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
81376 root          1  20    0    14M  3704K CPU2     2   0:00   1.20% top
64829 root          1  20    0    19M  9064K select   3   0:02   0.16% sshd-session
72899 root          1  20    0    26M    14M select   2   0:04   0.04% python3.11
75552 root          1  20    0    27M    15M select   2   0:06   0.04% python3.11
  247 root          1  21    0    77M    40M accept   3   1:16   0.00% python3.11
30080 root          3  20    0    48M    14M kqread   3   0:08   0.00% syslog-ng
49989 root          1  20    0    22M  9436K kqread   3   0:07   0.00% lighttpd
  241 root          1  68    0    26M    15M wait     1   0:05   0.00% python3.11
58039 root          1  23    0    56M    32M accept   0   0:04   0.00% php-cgi
...



root@jupiter:~ # sysctl dev.cpu
dev.cpu.3.temperature: 33.0C
dev.cpu.3.coretemp.throttle_log: 0
dev.cpu.3.coretemp.tjmax: 105.0C
dev.cpu.3.coretemp.resolution: 1
dev.cpu.3.coretemp.delta: 72
dev.cpu.3.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
dev.cpu.3.cx_usage_counters: 83088 196091 328803
dev.cpu.3.cx_usage: 13.66% 32.25% 54.08% last 2407us
dev.cpu.3.cx_lowest: C3
dev.cpu.3.cx_supported: C1/1/1 C2/2/127 C3/3/1048
dev.cpu.3.freq_levels: 806/-1
dev.cpu.3.freq: 402
dev.cpu.3.%parent: acpi0
dev.cpu.3.%pnpinfo: _HID=ACPI0007 _UID=3 _CID=none
dev.cpu.3.%location: handle=\_SB_.PR03
dev.cpu.3.%driver: cpu
dev.cpu.3.%desc: ACPI CPU
dev.cpu.2.temperature: 32.0C
dev.cpu.2.coretemp.throttle_log: 0
dev.cpu.2.coretemp.tjmax: 105.0C
dev.cpu.2.coretemp.resolution: 1
dev.cpu.2.coretemp.delta: 73
dev.cpu.2.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
dev.cpu.2.cx_usage_counters: 198145 477048 365372
dev.cpu.2.cx_usage: 19.04% 45.84% 35.11% last 9389us
dev.cpu.2.cx_lowest: C3
dev.cpu.2.cx_supported: C1/1/1 C2/2/127 C3/3/1048
dev.cpu.2.freq_levels: 806/-1
dev.cpu.2.freq: 402
dev.cpu.2.%parent: acpi0
dev.cpu.2.%pnpinfo: _HID=ACPI0007 _UID=2 _CID=none
dev.cpu.2.%location: handle=\_SB_.PR02
dev.cpu.2.%driver: cpu
dev.cpu.2.%desc: ACPI CPU
dev.cpu.1.temperature: 32.0C
dev.cpu.1.coretemp.throttle_log: 0
dev.cpu.1.coretemp.tjmax: 105.0C
dev.cpu.1.coretemp.resolution: 1
dev.cpu.1.coretemp.delta: 74
dev.cpu.1.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
dev.cpu.1.cx_usage_counters: 31342 66100 323061
dev.cpu.1.cx_usage: 7.45% 15.71% 76.82% last 3038us
dev.cpu.1.cx_lowest: C3
dev.cpu.1.cx_supported: C1/1/1 C2/2/127 C3/3/1048
dev.cpu.1.freq_levels: 806/-1
dev.cpu.1.freq: 402
dev.cpu.1.%parent: acpi0
dev.cpu.1.%pnpinfo: _HID=ACPI0007 _UID=1 _CID=none
dev.cpu.1.%location: handle=\_SB_.PR01
dev.cpu.1.%driver: cpu
dev.cpu.1.%desc: ACPI CPU
dev.cpu.0.temperature: 32.0C
dev.cpu.0.coretemp.throttle_log: 0
dev.cpu.0.coretemp.tjmax: 105.0C
dev.cpu.0.coretemp.resolution: 1
dev.cpu.0.coretemp.delta: 73
dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
dev.cpu.0.cx_usage_counters: 558802 371293 343781
dev.cpu.0.cx_usage: 43.86% 29.14% 26.98% last 152us
dev.cpu.0.cx_lowest: C3
dev.cpu.0.cx_supported: C1/1/1 C2/2/127 C3/3/1048
dev.cpu.0.freq_levels: 806/-1
dev.cpu.0.freq: 402
dev.cpu.0.%parent: acpi0
dev.cpu.0.%pnpinfo: _HID=ACPI0007 _UID=0 _CID=none
dev.cpu.0.%location: handle=\_SB_.PR00
dev.cpu.0.%driver: cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.%parent:

What I do not understand is that the CPU temperatures are at 32°C, which seems very low, especially if the system uses 33 watts in total. When the CPU is loaded, that would be an expactable power draw, but then it would have ~60°C with passive cooling.

Sure, the DDR5 ram and the NVME disk use power too, and not that little in comparison to that 6 watt TDP CPU.

Did you check the disk temperature as well with smartcl? After a reboot, you could have processes accessing the disk, like RRD/Netflow. Also, there are SSD disks that do self-refreshs in order to fight flash decay - Samsung is known to do this. In the latter case, you would need no active process and have high power draw from the SSD disk.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

QuoteWhat I do not understand is that the CPU temperatures are at 32°C, which seems very low, especially if the system uses 33 watts in total. When the CPU is loaded, that would be an expactable power draw, but then it would have ~60°C with passive cooling.
There is a slow 120mm fan placed on the chassis. Even with ~10W it was to hot for me on the long term (40-50°). So, a 0.3W fan for a long living hardware was a good deal for me. :-)

QuoteDid you check the disk temperature as well with smartcl? After a reboot, you could have processes accessing the disk, like RRD/Netflow. Also, there are SSD disks that do self-refreshs in order to fight flash decay - Samsung is known to do this. In the latter case, you would need no active process and have high power draw from the SSD disk.
Netflow/RRD are disabled. The SSD is currently at 44°, so quite ok.

But, after some more investigations on the cpu load, I noticed this happening regulary:
USER   PID  %CPU %MEM   VSZ   RSS TT  STAT STARTED       TIME COMMAND
root    11 340.4  0.0     0    64  -  RNL  15:53   1348:45.27 [idle]
root 72038  64.7  0.0 26912  4460  -  S    22:14      0:00.07 /usr/local/sbin/iftop -nNb -i igc2 -s 2 -t
root 65733  30.2  0.3 77436 53292  -  S    22:14      0:02.97 /usr/local/bin/python3 /usr/local/opnsense/script/interfaces/traffic_top.py --interfaces igc2,vlan01 (python3.11)

USER   PID  %CPU %MEM   VSZ   RSS TT  STAT STARTED       TIME COMMAND
root    11 355.3  0.0     0    64  -  RNL  15:53   1368:28.36 [idle]
root 39289  82.8  0.0 22816  4372  -  S    22:20      0:00.07 /usr/local/sbin/iftop -nNb -i vlan01 -s 2 -t
root 39757  82.8  0.0 26912  4468  -  S    22:20      0:00.07 /usr/local/sbin/iftop -nNb -i igc2 -s 2 -t
root 36817  58.9  0.3 77452 53312  -  S    22:20      0:02.89 /usr/local/bin/python3 /usr/local/opnsense/script/interfaces/traffic_top.py --interfaces igc2,vlan01 (python3.11)


Never checked this before in that way, but seems to be no normal behavior to me, right?

What can cause this?

Best regards
Robert

That happens if you open the "Reporting: Traffic" page, especially when you monitor a lot of interfaces.
However, I doubt that this is much different switching from 24.7.1 to 24.7.2.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

QuoteThat happens if you open the "Reporting: Traffic" page, especially when you monitor a lot of interfaces.
However, I doubt that this is much different switching from 24.7.1 to 24.7.2.

You're right. The CPU usage in my last post was my own mistake. Had a browser tab open on my workstation while analysing cpu usage via laptop..

So all back to start, still looking for the reason of the high power consumption without visible reason.:-(

Since the CPU can only draw about 25 watts max, you could also load the CPU by stress-ng and see if the power consumption rises even more. If it does, you can be sure that the power draw is not the CPU, but SSD or something else. I already suggested to look at the SSD temps.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

QuoteSince the CPU can only draw about 25 watts max, you could also load the CPU by stress-ng and see if the power consumption rises even more. If it does, you can be sure that the power draw is not the CPU, but SSD or something else. I already suggested to look at the SSD temps.
QuoteThe SSD is currently at 44°, so quite ok.

Already checked the SSD temperatures, see one of my last posts. Unfortunately as well ok.

Nevertheless, I decided to switch to a backup before the update and did the update again + switched to the new microcode plugin instead of doing it manually. Now all is fine. So I think there must be some one time problem when updating last time or with the CPU micro code updates. While I don't like things which are not proven, I think this is the most likely explanation.

Thanks all for you help!