CPU Performance after 23.7 upgrade

franco · August 24, 2023, 06:40:32 PM

Yep, uploaded the new snapshot build today. Still need to restart the watcher:

# pluginctl -s dpinger restart :watcher:

Cheers,
Franco

Fabian Wenk · August 26, 2023, 06:28:16 PM

In my case the patch did not help to reduce CPU usage at all. The change of the $poll from 1 to 5 helped to get CPU usage back to the level it was with 23.1.

franco · August 26, 2023, 08:40:53 PM

It really doesn't do much as I said initially. It also depends on hardware... or when a link is flapping constantly and it's forced to trigger. But nobody posted a gateway log of such events. These are new in 23.7 by the way to see exactly what happened to a gateway and when.

We can make the delays configurable some day but not before ironing out the real issues first.

Cheers,
Franco

Fabian Wenk · August 26, 2023, 09:58:12 PM

Sorry, I did somehow miss your request for entries out of the log file. I am not sure if this helps, it is out of /var/log/gateways/gateways_2023082{4,6}.log (the newest 2 files):

Code Select


<165>1 2023-08-24T13:18:04+00:00 scrambler.wenks.ch dpinger 2426 - [meta sequenceId="1"] ALERT: WAN1IP4GW (Addr: 8.8.8.8 Alarm: down -> none RTT: 2.7 ms RTTd: 1.3 ms Loss: 0.0 %)
<165>1 2023-08-24T13:18:04+00:00 scrambler.wenks.ch dpinger 2426 - [meta sequenceId="2"] ALERT: WAN2IP4GW (Addr: 8.8.4.4 Alarm: down -> none RTT: 15.1 ms RTTd: 6.8 ms Loss: 0.0 %)
<165>1 2023-08-24T13:18:04+00:00 scrambler.wenks.ch dpinger 2426 - [meta sequenceId="3"] ALERT: WAN1IP6GW (Addr: 2606:4700:4700::1111 Alarm: down -> none RTT: 2.4 ms RTTd: 1.5 ms Loss: 0.0 %)
<165>1 2023-08-24T13:18:04+00:00 scrambler.wenks.ch dpinger 2426 - [meta sequenceId="4"] ALERT: WAN2IP6GW (Addr: 2620:fe::9 Alarm: down -> none RTT: 2.5 ms RTTd: 1.5 ms Loss: 0.0 %)
<165>1 2023-08-24T18:18:52+00:00 scrambler.wenks.ch dpinger 72524 - [meta sequenceId="1"] ALERT: WAN1IP4GW (Addr: 8.8.8.8 Alarm: down -> none RTT: 2.3 ms RTTd: 1.3 ms Loss: 0.0 %)
<165>1 2023-08-24T18:18:52+00:00 scrambler.wenks.ch dpinger 72524 - [meta sequenceId="2"] ALERT: WAN2IP4GW (Addr: 8.8.4.4 Alarm: down -> none RTT: 12.5 ms RTTd: 4.9 ms Loss: 0.0 %)
<165>1 2023-08-24T18:18:52+00:00 scrambler.wenks.ch dpinger 72524 - [meta sequenceId="3"] ALERT: WAN1IP6GW (Addr: 2606:4700:4700::1111 Alarm: down -> none RTT: 2.1 ms RTTd: 1.5 ms Loss: 0.0 %)
<165>1 2023-08-24T18:18:52+00:00 scrambler.wenks.ch dpinger 72524 - [meta sequenceId="4"] ALERT: WAN2IP6GW (Addr: 2620:fe::9 Alarm: down -> none RTT: 2.1 ms RTTd: 1.3 ms Loss: 0.0 %)
<12>1 2023-08-26T15:24:50+00:00 scrambler.wenks.ch dpinger 50861 - [meta sequenceId="1"] exiting on signal 15
<12>1 2023-08-26T15:24:50+00:00 scrambler.wenks.ch dpinger 60121 - [meta sequenceId="2"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.8.8  bind_addr 62.12.173.5  identifier "WAN1IP4GW "
<165>1 2023-08-26T15:24:51+00:00 scrambler.wenks.ch dpinger 72524 - [meta sequenceId="3"] Reloaded gateway watcher configuration on SIGHUP
<12>1 2023-08-26T15:25:05+00:00 scrambler.wenks.ch dpinger 57225 - [meta sequenceId="4"] exiting on signal 15
<12>1 2023-08-26T15:25:06+00:00 scrambler.wenks.ch dpinger 82049 - [meta sequenceId="5"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 8.8.4.4  bind_addr 62.2.85.182  identifier "WAN2IP4GW "
<165>1 2023-08-26T15:25:08+00:00 scrambler.wenks.ch dpinger 72524 - [meta sequenceId="6"] Reloaded gateway watcher configuration on SIGHUP
<12>1 2023-08-26T15:26:26+00:00 scrambler.wenks.ch dpinger 48560 - [meta sequenceId="1"] exiting on signal 15
<12>1 2023-08-26T15:26:26+00:00 scrambler.wenks.ch dpinger 68631 - [meta sequenceId="2"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 2606:4700:4700::1111  bind_addr 2001:8a8:1005:1::5  identifier "WAN1IP6GW "
<165>1 2023-08-26T15:26:26+00:00 scrambler.wenks.ch dpinger 72524 - [meta sequenceId="3"] Reloaded gateway watcher configuration on SIGHUP
<12>1 2023-08-26T15:26:39+00:00 scrambler.wenks.ch dpinger 52964 - [meta sequenceId="4"] exiting on signal 15
<12>1 2023-08-26T15:26:40+00:00 scrambler.wenks.ch dpinger 59088 - [meta sequenceId="5"] send_interval 1000ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  alarm_hold 10000ms  dest_addr 2620:fe::9  bind_addr 2001:8a8:1005:2::182  identifier "WAN2IP6GW "
<165>1 2023-08-26T15:26:40+00:00 scrambler.wenks.ch dpinger 72524 - [meta sequenceId="6"] Reloaded gateway watcher configuration on SIGHUP
<165>1 2023-08-26T16:16:36+00:00 scrambler.wenks.ch dpinger 56541 - [meta sequenceId="1"] ALERT: WAN1IP4GW (Addr: 8.8.8.8 Alarm: down -> none RTT: 2.3 ms RTTd: 2.0 ms Loss: 0.0 %)
<165>1 2023-08-26T16:16:36+00:00 scrambler.wenks.ch dpinger 56541 - [meta sequenceId="2"] ALERT: WAN2IP4GW (Addr: 8.8.4.4 Alarm: down -> none RTT: 13.4 ms RTTd: 6.5 ms Loss: 0.0 %)
<165>1 2023-08-26T16:16:36+00:00 scrambler.wenks.ch dpinger 56541 - [meta sequenceId="3"] ALERT: WAN1IP6GW (Addr: 2606:4700:4700::1111 Alarm: down -> none RTT: 1.9 ms RTTd: 2.1 ms Loss: 0.0 %)
<165>1 2023-08-26T16:16:36+00:00 scrambler.wenks.ch dpinger 56541 - [meta sequenceId="4"] ALERT: WAN2IP6GW (Addr: 2620:fe::9 Alarm: down -> none RTT: 2.1 ms RTTd: 2.1 ms Loss: 0.0 %)

franco · August 31, 2023, 02:54:09 PM

That looks normal. Thanks so far.

I've slipped a small change into 23.7.3 to be able to optimise this further. If you are on 23.7.3 you can try the next patch:

https://github.com/opnsense/core/commit/bb951f2eb

# opnsense-patch bb951f2eb
# pluginctl -s dpinger restart :watcher:

Note that this breaks if being applied to 23.7.2 or earlier...

Cheers,
Franco

Fabian Wenk · August 31, 2023, 07:48:00 PM

It helped to reduce CPU usage. In my case it is around 1/3 down from the 23.7.0 level to what it was with 23.1.x level (both with default $poll = 1).

franco · August 31, 2023, 08:54:33 PM

That's sort of what I saw from here... for each second I had 0.3 seconds overall processing time and that patch should take 0.1 away. I still have to look at the class - that seems to be where the other 0.2 seconds are spent. Maybe we can get to down another 0.1. Otherwise we really have to change the standard poll interval.

Cheers,
Franco

Fabian Wenk · September 15, 2023, 06:15:16 PM

Hello Franco

On my systems since the update to OPNsense 23.7.4 last night the CPU usage is down to the level it was with 23.1.x.

Thank you very much for this improvements!

Best regards,
Fabian

beneix · December 04, 2023, 10:23:43 AM

I am seeing high CPU use and temperatures after upgrading to 23.7.9. Netflow seems to be the culprit. Resetting Netflow data and rebuilding the database have not helped, nor has rebooting the box.

Code Select

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
15527 root          1 100    0    30M    20M CPU3     3  18:40  93.46% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.9)
88671 root         31  20    0   865M   203M uwait    2   1:13   3.42% /usr/local/AdGuardHome/AdGuardHome -s run
26850 root          1  52    0    58M    28M accept   2   0:02   1.76% /usr/local/bin/php-cgi
56115 root          1  20    0    13M  2548K bpf      3   0:09   1.27% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
32301 root          5  20    0    45M    12M kqread   0   0:19   0.88% /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid
18061 root          1  52    0    59M    28M accept   0   0:06   0.59% /usr/local/bin/php-cgi
23558 root          1  20    0    59M    28M accept   2   0:04   0.49% /usr/local/bin/php-cgi

Has anyone else seen this?

beneix · December 08, 2023, 03:08:25 PM

After a couple of days' running, it still seems netflow is causing significantly higher CPU use than before the upgrade. I have turned off IPv6 completely (following these instructions) to see if that would help; it made no difference.

Code Select

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
15527 root          1  80    0    54M    36M CPU2     2  22.8H  27.78% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.9)
51856 root          1  21    0    58M    28M accept   3   0:05   1.27% /usr/local/bin/php-cgi
66423 root          1  52    0    58M    28M accept   3   0:02   1.17% /usr/local/bin/php-cgi
83323 root          1  20    0    58M    28M accept   2   0:05   0.29% /usr/local/bin/php-cgi
88671 root         31  20    0  1130M   477M uwait    0  38:39   0.10% /usr/local/AdGuardHome/AdGuardHome -s run
18068 root          1  52    0    58M    29M accept   1   0:06   0.10% /usr/local/bin/php-cgi
14297 root          1  25    0    13M  3792K pause    3   0:00   0.10% /bin/csh
  241 root          7  52    0   104M    42M accept   2  19:32   0.00% /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.9)
20283 root          1  20    0    88M    60M nanslp   1  11:59   0.00% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_watcher.php interface routes alarm
32301 root          4  20    0    49M    12M kqread   3   8:51   0.00% /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid
73875 root          1  20    0    13M  2644K bpf      3   4:38   0.00% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
79273 root          1  25    0    13M  3016K wait     3   3:14   0.00% /bin/sh /var/db/rrd/updaterrd.sh

Does anyone have suggestions as to what I could attempt, other than turning netflow off?

manuel · December 21, 2023, 03:08:55 PM

Same issue here. Since upgrade to 23.7.10_1-amd64 cpu usage is much more higher then before and my monit system monitor regularly complains about cpu usage over 75%.

Code Select

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
46114 root          1  85    0    48M    31M CPU1     1  33.0H  97.91% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.9)
24479 root          1  28    0    69M    33M nanslp   0   0:01  11.18% /usr/local/bin/php-cgi
  257 root          1  52    0   135M    40M accept   0  39:54   4.87% /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.9)
67545 root         13  20    0  2869M  1074M nanslp   1 790:15   3.19% /usr/local/bin/suricata -D --netmap --pidfile /var/run/suricata.pid -c /usr/local/etc/suricata/suricata.yaml
19874 root         12  20  -20   906M    43M uwait    2 124:11   2.45% ipdrstreamer
22851 mongodb      28  52    0  2876M  1043M uwait    2 242:15   2.34% /usr/local/bin/mongod --logappend --setParameter=disabledSecureAllocatorDomains=* --config /usr/local/etc/mongodb.conf --fork
19514 root         12  20  -20  2997M   332M nanslp   1 471:32   2.03% eastpect: Eastpect Instance 0 (eastpect)
55207 root          3  20    0    50M  9976K kqread   3  12:06   0.46% /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid
15107 root          1  20    0    14M  3660K CPU3     3   0:02   0.43% top
44068 root          1  20    0    60M    31M accept   0   0:06   0.31% /usr/local/bin/php-cgi
37823 root          2  20  -20   946M    56M nanslp   2  18:10   0.28% /usr/local/zenarmor//bin/eastpect -D
80154 root          1  20    0    13M  1796K bpf      0   2:04   0.18% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
80838 root          1  28    0    13M  1784K nanslp   0   0:19   0.10% /usr/sbin/cron -s
16942 root          1  20    0    21M  7204K kqread   1   1:04   0.09% /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf
69045 root          1  20    0    25M    14M select   1   1:58   0.08% /usr/local/bin/python3 /usr/local/opnsense/scripts/dhcp/unbound_watcher.py --domain home (python3.9)
82347 root          1  20    0    23M    11M select   1   3:23   0.05% /usr/local/bin/python3 /usr/local/sbin/configctl -e -t 0.5 system event config_changed (python3.9)
98727 root          1  20    0    18M  7344K select   0   0:00   0.05% sshd: root@pts/0 (sshd)
82850 root          1  20    0    23M    11M select   1   3:04   0.04% /usr/local/bin/python3 /usr/local/opnsense/scripts/syslog/lockout_handler (python3.9)
  258 squid         1  20    0   174M    21M kqread   3   1:39   0.03% (squid-1) --kid squid-1 -f /usr/local/etc/squid/squid.conf (squid)
54088 root          1  21    0    13M  1932K wait     1   1:25   0.02% /bin/sh /var/db/rrd/updaterrd.sh
39938 root          1  20    0    21M  5868K select   2   0:28   0.01% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf

Does anyone have some suggestions?

Greetings Manuel

CPU Performance after 23.7 upgrade

franco

August 24, 2023, 06:40:32 PM #15

Fabian Wenk

August 26, 2023, 06:28:16 PM #16

franco

August 26, 2023, 08:40:53 PM #17

Fabian Wenk

August 26, 2023, 09:58:12 PM #18

franco

August 31, 2023, 02:54:09 PM #19

Fabian Wenk

August 31, 2023, 07:48:00 PM #20

franco

August 31, 2023, 08:54:33 PM #21

Fabian Wenk

September 15, 2023, 06:15:16 PM #22

beneix

December 04, 2023, 10:23:43 AM #23

beneix

December 08, 2023, 03:08:25 PM #24

manuel

December 21, 2023, 03:08:55 PM #25