[SOLVED] APU - Is your CPU clock stuck at low frequency after some time?

Started by thowe, January 30, 2021, 12:09:42 PM

Previous topic - Next topic
The APU2/3/4 devices are predestined for use as routers under OPNsense: Inexpensive reliable hardware with open source BIOS. Accordingly, it is often used by users here in the forum.

The performance of the used SOC AMD GX412TC is certainly much lower compared to i3 or i5 systems. However, I can say from experience that they are more than sufficient in many cases and more performance is simply not necessary. And that's why I prefer such a cost-effective and power-saving device whenever possible.

However, questions have been piling up here and in other forums over the past few months as to why APU-based firewall performance dips and doesn't seem to be enough for some, while it's fast enough for others in a similar context.

Of course, this always depends on the individual case: What services are enabled on the firewall? Does WAN have to be connected via PPPoE? How large is the MTU? Are the correct tunables set? Etc.

Meanwhile I am not sure if there is not one more aspect: I have two APU2 in use (APU2C4 and APU2E4). I noticed that they are really responsive and powerful after a reboot. After some time (several hours), they become noticeably sluggish and the CPU utilization suddenly seems higher.

I then noticed that after a reboot, the APU can scale up the frequency to the nominal maximum, which is 1GHZ. After a while, however, the maximum achievable frequency seems to be reduced to 600MHz. This then remains the same until a reboot. After the reboot, the 1GHz can be reached again for a certain time. Until suddenly the 600MHz limit applies again.

I would be interested to know how the situation looks like with your APU. If you want to participate, you can post your observations here. In the following I describe how you can determine the maximum busy clock.


You need console or SSH access to your OPNsense:

All the measurements are done on the console or in an SSH shell on the OPNsense. If you do not have access to the console, you can set up SSH access as follows:

  • System: Settings: Administration
  • Secure Shell Server: Enable Secure Shell
  • Root Login: Permit root user login
  • Authentication Method: Permit password login
  • Now you should be able to access with ssh YOUR_USER@YOUR_FIREWALL_IP
  • If you do not access with root, you may have to become super user: su
Note: After taking the measurements, the access can be deactivated again for security reasons.


You need to install and use the tool turbostat:

The measurements are done with the tool turbostat, which can be installed as follows:

pkg add http://pkg0.isc.freebsd.org/FreeBSD:12:amd64/latest/All/turbostat-4.17_2.txz
rehash


Before using turbostat you have to load the kernel module cpuctl once before doing measurements:

kldload cpuctl


A measurement series is started as follows:
turbostat --interval 3

After that the tool prints the CPU statistics every 3 seconds.

After a reboot everything runs normally and the output shows that the Bzy_MHz is near 1GHz:

root@router:~ # turbostat --interval 3
turbostat version 17.06.23 - Len Brown <lenb@kernel.org>
turbostat: /dev/cpuctl0 missing, kldload cpuctl: No such file or directory
root@router:~ # kldload cpuctl
root@router:~ # turbostat --interval 3
turbostat version 17.06.23 - Len Brown <lenb@kernel.org>
CPUID(0): AuthenticAMD 13 CPUID levels; family:model:stepping 0xf:30:1 (15:48:1)
CPUID(1): SSE3 MONITOR - - - TSC MSR - -
CPUID(6): APERF, No-TURBO, No-DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, No-EPB
CPUID(7): No-SGX
NSFOD /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       355     38.76   915     998     170
0       0       247     28.12   880     998     32
1       1       222     25.44   871     998     69
2       2       414     44.66   928     998     43
3       3       536     56.82   943     998     26
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       313     34.39   910     998     334
0       0       410     44.19   928     998     184
1       1       336     36.68   915     998     80
2       2       236     26.66   885     998     49
3       3       269     30.02   898     998     21
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       267     29.60   904     998     247
0       0       520     54.88   947     998     68
1       1       289     31.59   914     998     82
2       2       127     15.56   813     998     42
3       3       135     16.36   825     998     55
^C


When the OPNsense has been running for a few hours, the output shows that the Bzy_MHz is below 600MHz (even under maximum load):

root@router:~ # turbostat --interval 3
turbostat version 17.06.23 - Len Brown <lenb@kernel.org>
CPUID(0): AuthenticAMD 13 CPUID levels; family:model:stepping 0xf:30:1 (15:48:1)
CPUID(1): SSE3 MONITOR - - - TSC MSR - -
CPUID(6): APERF, No-TURBO, No-DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, No-EPB
CPUID(7): No-SGX
NSFOD /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       52      8.61    599     998     162
0       0       78      13.01   599     998     37
1       1       37      6.21    599     998     69
2       2       44      7.40    599     998     35
3       3       47      7.82    599     998     21
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       57      9.56    598     997     208
0       0       87      14.48   598     996     51
1       1       45      7.53    598     996     79
2       2       55      9.15    599     998     62
3       3       42      7.09    599     998     16
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       59      9.81    599     999     297
0       0       94      15.67   600     1000    67
1       1       45      7.56    600     1000    104
2       2       60      9.97    599     998     68
3       3       36      6.01    599     998     58
^C


When you report your observations, it would be interesting to know which BIOS version you have installed (can be conveniently viewed in the Hardware Information widget) and whether you have the Core Performance Boost feature set to enabled or disabled in the BIOS (in newer BIOSes, the default value is enabled).
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC

I've got an APU2E4 running firmware v4.13.0.2 and the core performance boost feature is enabled - OPNsense 21.1-amd64

root@hush:/ # turbostat --interval 3
turbostat version 17.06.23 - Len Brown <lenb@kernel.org>
CPUID(0): AuthenticAMD 13 CPUID levels; family:model:stepping 0xf:30:1 (15:48:1)
CPUID(1): SSE3 MONITOR - - - TSC MSR - -
CPUID(6): APERF, No-TURBO, No-DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, No-EPB
CPUID(7): No-SGX
NSFOD /sys/devices/system/cpu/cpu2/cpufreq/scaling_driver
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       436     46.23   944     998     2631
0       0       397     42.67   929     998     153
1       1       432     45.78   944     998     2270
2       2       404     42.63   947     998     99
3       3       513     53.83   953     998     109
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       431     45.38   951     998     671
0       0       351     37.51   936     998     48
1       1       527     55.12   957     998     454
2       2       375     39.53   950     998     65
3       3       472     49.38   955     998     104
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       314     32.95   954     998     1267
0       0       204     22.01   925     998     171
1       1       358     37.30   959     998     701
2       2       264     27.91   947     998     195
3       3       431     44.56   967     998     200
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       301     31.59   953     998     1600
0       0       219     23.64   928     998     67
1       1       306     31.95   957     998     1278
2       2       170     18.10   937     998     132
3       3       510     52.68   967     998     123
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       269     27.81   966     998     783
0       0       279     29.08   960     998     145
1       1       350     36.08   969     998     374
2       2       235     24.30   966     998     135
3       3       211     21.78   969     998     129
^C

Looks like this on my apu4d4:

Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ
- - 87 14.47 599 998 7731
0 0 28 4.71 599 998 1023
1 1 42 6.95 599 998 2394
2 2 147 24.60 599 998 543
3 3 129 21.62 599 998 3771


BIOS is coreboot v4.13.0.2.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Thanks! Interesting!

@hushcoden: Your CPU seems to run normal - up to 1GHz. How long has it been running, when you run turbostat?

@pmhausen: Your CPU seems to be stuck at 600 MHz. As are mine after some time after a reboot.

The question is: What are the differences between the setups?

  • Are all your NIC ports connected (under some load) and thus producing IRQs?
  • What are your custom tunables?

What I can say: I have tried multiple BIOS versions without any difference. And even disabling Core Power Boost in BIOS didn't help. CPU is still stuck a 600MHz after one or two hours.

Maybe we will find the root cause.
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC


root@opnsense:~ # uptime
9:14PM  up  5:34, 1 user, load averages: 0.34, 0.42, 0.36
root@opnsense:~ # turbostat --interval 2
turbostat version 17.06.23 - Len Brown <lenb@kernel.org>
CPUID(0): AuthenticAMD 13 CPUID levels; family:model:stepping 0xf:30:1 (15:48:1)
CPUID(1): SSE3 MONITOR - - - TSC MSR - -
CPUID(6): APERF, No-TURBO, No-DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, No-EPB
CPUID(7): No-SGX
NSFOD /sys/devices/system/cpu/cpu3/cpufreq/scaling_driver
Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ
- - 409 43.61 938 998 7590
0 0 143 15.96 896 998 564
1 1 528 55.93 944 998 2895
2 2 494 52.45 942 998 822
3 3 472 50.11 941 998 3309
^C


What I changed was: configure powerd to use "Maximum" instead of "Hiadaptive".

I'll check again tomorrow in the morning.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

That is interesting! Thanks.
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC

I've attached my power savings settings and my tunables.

root@hush:/ # uptime
9:02PM  up 2 days,  2:11, 1 user, load averages: 1.01, 0.91, 0.85
root@hush:/ # turbostat --interval 3
turbostat version 17.06.23 - Len Brown <lenb@kernel.org>
CPUID(0): AuthenticAMD 13 CPUID levels; family:model:stepping 0xf:30:1 (15:48:1)
CPUID(1): SSE3 MONITOR - - - TSC MSR - -
CPUID(6): APERF, No-TURBO, No-DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, No-EPB
CPUID(7): No-SGX
NSFOD /sys/devices/system/cpu/cpu3/cpufreq/scaling_driver
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       323     34.65   932     998     4694
0       0       368     39.41   935     998     89
1       1       394     42.28   932     998     4448
2       2       331     35.47   934     998     89
3       3       197     21.45   921     998     68
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       178     19.21   926     998     1916
0       0       236     25.55   926     998     6
1       1       162     17.25   940     998     1742
2       2       190     20.53   924     998     154
3       3       124     13.52   914     998     14
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       219     23.63   927     998     3277
0       0       295     31.81   927     998     87
1       1       258     27.77   929     998     3176
2       2       120     12.70   941     998     13
3       3       204     22.25   916     998     1
Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       138     14.68   938     998     4244
0       0       146     15.74   930     998     100
1       1       115     12.15   944     998     3978
2       2       163     17.21   945     998     166
3       3       128     13.63   936     998     0
^C

@thowe: update as promised.

Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ
- - 418 44.10 947 998 7750
0 0 236 25.09 940 998 534
1 1 238 26.08 911 998 1829
2 2 554 58.06 954 998 2052
3 3 643 67.16 958 998 3335


I immediately suspected powerd when I read your post, but changing the settings did not have an immediate effect without a reboot, so it took me some time to have an opportunity for that.

@hushcoden is not running powerd at all ... which would fit the picture.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

This would mean that the problem occurs in connection with powerd.

Short-term recommendation for maximum performance would be to switch off powerd or set it to profile Maximum. However, with the effect that unnecessarily much energy is consumed.

I have a ticket open with the BIOS developers and will provide the information there. Because the goal should be that we can use the power dynamically without burning energy senselessly.
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC

I have now also set my powerd to maximum and restarted. Can confirm that the problem does not occur.


Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ
-       -       523     55.64   940     998     536
0       0       431     46.42   927     999     194
1       1       486     52.01   935     998     123
2       2       619     65.20   950     998     92
3       3       557     58.92   945     998     127
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC

This damn AMD jaguar SoC still has lot of problems even after so many years of maturity (this exact 600Mhz stuck problem seems recurring every single year, even if it was "solved" in coreboot multiple times).

I think it is not so clear what is the cause this time. Powerd? SOC? BIOS? Something else?
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC

If it works as expected without powerd then my personal solution is "don't run it, then". I mean, the APU series of devices are such low power platforms anyway, there's probably not much to gain.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

A ~12 watt device left running 24/7 is gonna cost you ~£10/year... so....

Thing I'd like to know is why, inspite of the newer BIOS's, the CPU still does boost to 1.4Ghz....?

QuoteThing I'd like to know is why, inspite of the newer BIOS's, the CPU still does boost to 1.4Ghz....?

Yes, this should be the case if it is enabled in the BIOS. However, it is not easy to verify since the boost is CPU-internal and can hardly be observed from the outside. The easiest way to find out is with benchmarks, e.g. with stress-ng. There are typical bogo ops values for CPB on and off.
System 1: PC Engines APU2C4
System 2: PC Engines APU2E4
System 3: Proxmox-VM on Intel NUC