phyton3 100% CPU OPNsense 22.7.11-amd64 [SOLVED]

Started by iorx, January 21, 2023, 03:07:16 PM

Previous topic - Next topic
January 21, 2023, 03:07:16 PM Last Edit: January 21, 2023, 07:58:24 PM by iorx
Hi,

Updatead my system. Just mine that went to 100% CPU on phyton3.0?

Versions    OPNsense 22.7.11-amd64
FreeBSD 13.1-RELEASE-p5
OpenSSL 1.1.1s 1 Nov 2022

Hardware information
Platform
Manufacturer    PC Engines
Product Name    apu3
Version    1.0
BIOS
Vendor    coreboot
Version    v4.17.0.2
Release Date    07/28/2022


last pid: 52507;  load averages:  2.00,  1.59,  0.85                                                                      up 0+00:07:13  15:01:44
47 processes:  2 running, 45 sleeping
CPU: 24.8% user,  0.0% nice,  0.3% system,  0.7% interrupt, 74.2% idle
Mem: 207M Active, 189M Inact, 776M Wired, 40K Buf, 2733M Free
ARC: 312M Total, 163M MFU, 121M MRU, 5516K Anon, 2097K Header, 21M Other
     248M Compressed, 609M Uncompressed, 2.45:1 Ratio
Swap: 8192M Total, 8192M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
73471 root          1 103    0    37M    24M CPU3     3   5:45  99.55% python3.9
9049 root          1  20    0    14M  3816K CPU1     1   0:00   0.37% top
55338 root          1  20    0    34M    14M select   0   0:01   0.19% snmpd
3053 root          1  20    0    18M  7600K select   2   0:00   0.10% sshd


Before and after pic attached.

Note: Don't know exactly what I kill but "kill -3 73471" exited the process and CPU back to normal.

Possible to identify what's running behind a PID more that it's the phyton engine?

ps awwux | grep python
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

January 21, 2023, 05:12:58 PM #3 Last Edit: January 21, 2023, 05:15:16 PM by iorx
Thanks!

Here are the results.

root@fw-addars:~ # ps awwux | grep python
root    21837 100.0  0.7  43104  30936  -  Rs   16:45   25:49.91 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.9)
root      273   0.1  0.9  75060  37460  -  S    16:44    0:28.82 /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.9)
root      268   0.0  0.5  36256  22692  -  Is   16:43    0:03.69 /usr/local/bin/python3 /usr/local/opnsense/service/configd.py (python3.9)
root    21359   0.0  0.4  25764  15512  -  Ss   16:45    0:05.88 /usr/local/bin/python3 /usr/local/opnsense/scripts/dhcp/unbound_watcher.py --domain home.addars (python3.9)
root    78987   0.0  0.3  23736  12468  -  S    16:45    0:01.42 /usr/local/bin/python3 /usr/local/sbin/configctl -e -t 0.5 system event config_changed (python3.9)
root    92022   0.0  0.3  23972  12580  -  S    16:45    0:01.37 /usr/local/bin/python3 /usr/local/opnsense/scripts/syslog/lockout_handler (python3.9)
root    30635   0.0  0.1  12748   2364  0  S+   17:11    0:00.01 grep python

Looks like it's related to flowd.
Found this: https://forum.opnsense.org/index.php?topic=19786.0
Tried to disable IPv6 accordingly but still, after reboot, 100% CPU.

"It's me again".

A bit more info collected from me with the limited knowledge of the inner workings I have.

Unchecking Capture Local brings the CPU usage back to normal.
Did a "search around" on the subject here and elsewhere.

The flowd log shouldn't grow bigger than 100MB according to the help text. To me they look bigger than that.
I removed the logfiles and tried with Capture Local again. Same-same, 100% CPU.

Flowd files:
root@fw-addars:/var/log # ls -la
total 155759
drwxr-xr-x  21 root   wheel         45 Jan 21 03:01 .
drwxr-xr-x  32 root   wheel         32 Nov 30 15:52 ..
drwx------   2 root   wheel         34 Jan 21 18:01 audit
-rw-r--r--   1 root   wheel        389 Jan 21 17:40 boot.log
drwx------   2 root   wheel         34 Jan 21 18:01 configd
drwx------   2 root   wheel          3 Mar 26  2022 ddclient
drwx------   2 root   wheel         34 Jan 21 18:01 dhcpd
-rw-------   1 root   wheel       8864 Jan 16 03:01 dmesg.today
-rw-------   1 root   wheel       8788 Jan  8 03:01 dmesg.yesterday
drwx------   2 root   wheel         32 Jan 21 18:01 filter
drwx------   2 root   wheel         34 Jan 21 18:01 firewall
-rw-------   1 root   wheel  524907868 Jan 21 18:02 flowd.log
-rw-------   1 root   wheel   10559868 Jan  7 20:03 flowd.log.000001
-rw-------   1 root   wheel   10599372 Jan  7 17:38 flowd.log.000002
-rw-------   1 root   wheel   10543036 Jan  7 14:46 flowd.log.000003
-rw-------   1 root   wheel   10600272 Jan  7 12:48 flowd.log.000004
-rw-------   1 root   wheel   10499572 Jan  7 10:36 flowd.log.000005
-rw-------   1 root   wheel   10488136 Jan  7 04:08 flowd.log.000006
-rw-------   1 root   wheel   10695200 Jan  6 23:15 flowd.log.000007
-rw-------   1 root   wheel   10545080 Jan  6 20:21 flowd.log.000008
-rw-------   1 root   wheel   10547372 Jan  6 17:45 flowd.log.000009
-rw-------   1 root   wheel   10565036 Jan  6 15:04 flowd.log.000010
drwx------   2 root   wheel         34 Jan 21 18:01 gateways


Brgs,

Logging every single packet is CPU intensive. Looks like an APU device is not qualified fir the task depending on the traffic involved.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Strange thing is that it was fine before the upgrade. Haven't seen this behaviour until the last upgrade, and NetFlow Capture Local option has been enabled since way back.

Yes the APU isn't that powerful, this is an AMD GX-412TC SOC (4 cores, 4 threads)

There is no load on the system right now, and it's only a 250Mbit/s connection for WAN (fiber). Checked and around 10-20 packets/s. Active devices on the LAN is a couple of IoTs only right now, and me navigating the firewall through ssh and https.

Attached packets/s

Brgs,

QuoteI removed the logfiles and tried with Capture Local again
its better do it with Reporting: Settings->Reset Netflow data or
configctl netflow flush
imho

Quote from: Fright on January 21, 2023, 07:16:35 PM
QuoteI removed the logfiles and tried with Capture Local again
its better do it with Reporting: Settings->Reset Netflow data or
configctl netflow flush
imho

And there we go!!! That SOLVED it.
I used the interface option as you gave as one way to do it. phyton peeked for a moment and then back to all normal usage.
root@fw-addars:/var/log # ps awwux | grep python
root    21972   0.7  0.4  30304  18376  -  Ss   19:40     0:04.16 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.9)
root      267   0.0  0.5  36256  22692  -  Is   17:39     0:03.71 /usr/local/bin/python3 /usr/local/opnsense/service/configd.py (python3.9)
root      272   0.0  1.1  82740  45220  -  I    17:39     0:36.59 /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.9)
root     4934   0.0  0.4  25764  15520  -  Ss   17:41     0:09.59 /usr/local/bin/python3 /usr/local/opnsense/scripts/dhcp/unbound_watcher.py --domain home.addars (python3.9)
root    15093   0.0  0.3  23736  12472  -  S    17:41     0:03.31 /usr/local/bin/python3 /usr/local/sbin/configctl -e -t 0.5 system event config_changed (python3.9)
root    17100   0.0  0.3  23972  12576  -  S    17:41     0:03.16 /usr/local/bin/python3 /usr/local/opnsense/scripts/syslog/lockout_handler (python3.9)
root    48519   0.0  0.1  12748   2364  0  S+   19:41     0:00.01 grep python


Everybody, thank you immensely for the answers and patience going through this with me! Continue to have a nice weekend ladies and germs!