OPNsense Forum

English Forums => Hardware and Performance => Topic started by: amitis5 on June 26, 2019, 06:45:55 am

Title: Python CPU running at 100
Post by: amitis5 on June 26, 2019, 06:45:55 am
Hi Everyone,

I'm running on a super micro, XeonD, 8gig ram, 256gig SSD with a 5gig pipe from ISP. 

I'm seeing a huge increase, CPU running around 100 for hours per day, and I can't figure out why.  Python seems to be the main pull here, with ARP as number 2 and 3 per top.   I'm randomly seeing the DHCP service stop as well, takes forever to restart.  Nothing has really changed.  This unit runs a 150 unit apartment building, with 150 VLANs.  Up until two days ago, we've only been running about a 10% processor load. 

pfctl pops in at 80+% here and there on top as well.  I've attached the services running as well. 

Thanks in advance for the assist.

Top Output:

root@ICCN2:~ # top

last pid: 59182;  load averages:  9.47,  6.93,  6.97    up 8+07:43:51  23:28:10
59 processes:  4 running, 55 sleeping
CPU: 27.1%10131,  0.0% nice, 11.0 8.95,  6.86,  6.95err up 8+07:43:51e 23:28:10
57m: 182M Activ5, 2507M Ina2t, 88M Laundry, 1224M Wired, 758M Buf, 3797M Free
Swap:31.92M Total, 8192M Free 5.0          14.1             49.0
      68             1                         2           6        820
  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAN
25508 root             1  81    0  1043M 13328K CPU2    2   0:04  96.75% pfctl
80879 root             1 101    0 42708K 36228K CPU6    6 142.1H  95.94% python
80879 unbound          8 100    0 42708K 36268K kqr1    1 142.1H  95.61% python
17105 unbound          8  20    0   119M 92168K kqread  7   1:30   3.83% unboun
 4742 root             1  21    0  1033M  2332K CPU4    4 243:52   3.43% filter
11707 root             1  50    0 49052K 37724K accept  4   0:09   2.92% php-cg
59342 dhcpd            1  20    0 39112K 31280K CPU7    7   0:06   0.87% dhcpd
53556 _flowd           1  20    0  6300K  2496K select  7  13:39   0.61% flowd
51807 root             1  20    0  1034M  3228K CPU5    1   0:00   0.33% top
51807 root             1  20    0  1034M  3228K CPU2    2   0:00   0.22% top
26888 root             1  37    0 90092K 67344K accept  2   2:39   0.15% python
63180 zabbix           1  20    0 18968K  6808K nanslp  0   0:03   0.12% zabbix
 3556 _flowd           1  20    0  6300K  2496K select  0  13:39   0.11% flowd
23870 root             1  20    0 10972K  6792K kqread  1   0:28   0.06% lightt
36904 www              1  20    0 15289M  5724K kqread  0   0:45   0.05% ntphtt
36525 root             1  20    0  1038M  6612K select  0   0:01   0.02% sshd
23180 zabbix           1  20    0 58968K  6808K nanslp  6   0:09   0.02% zabbix
11322     92753        3          5.31,  5.98,  6.56ct  up 8+07:46:26  23:30:45
58188 www      4           4      15284K  9640K kqread  4   0:13         lightt
last 29.0 71096;  load averag20.0 5.16,  5. 7.2 6.56    up 843.846:24  23:30:43
60 proc1sses:  5 runn1ng, 55 sleeping          0           5          8
CPU: 17.4% user,  0.0% nice, 35.2% system,  7.7% interrupt, 39.8% idle
Mem: 196M Active, 2460M Inact, 88M Laundry, 1231M Wired, 764M Buf, 3824M Free
Swap: 8192M Total, 8192M Free
                                                   6    6          5.23
  202 USERNAME       THR P75 NICE   S33M  3180K CPU2    2   TIM1  78.19% arp
42679 root             1 172    0 23728K 18692K CPU1    1 142.10   7.41% python
11707 root             1  52    0 60320K 43636K nanslp  7   0:12   7.26% php-cg
 4742 root             1  23    0  1033M  2332K bpf     2 243:57   2.88% filter
88844 unbound          8  20    0    99M 60548K kqread  4   0:03   1.35% unboun
51653 zabbix           1   0    0 18968K  6040K select  7 114:03   1.29% syslog
59793 zabbix           1  22    0 18968K  6736K accept  5  10:40   0.99% zabbix
 8951 zabbix           1  21    0 18968K  6784K accept  1  10:38   0.80% zabbix
26888 root             2  20    0 92140K 67384K accept  4   2:39   0.66% python
84320 zabbix           1  22    0 18968K  6560K select  7  10:45   0.64% zabbix
59342 dhcpd            1  21    0 39112K 31280K select  4   0:07   0.16% dhcpdg
88844 unbound          8  20    0    99M 60536K kqread  6   0:03   2.14% unboun
51807 root             1  20    0  1034M  3232K CPU7    7   0:00   1.13% top
62732 nobody           1  20    0  6303M  1548K sbwait  6  13:17   0.22% sampli
62732 nobody           1  20    0  1033M  1548K sbwait  7  13:17   0.17% sampli
51807 root             1  20    0  1034M  3232K CPU6    6   0:00   0.13% top
last pid: 99735;  load averages:  5.79,  5.89,  6.12    up 8+08:00:17  23:44:36
61 processes:  8 running, 53 sleeping
CPU: 18.4% user,  0.0% nice, 45.3% system, 12.0% interrupt, 24.3% idle
Mem: 196M Active, 2484M Inact, 88M Laundry, 1240M Wired, 774M Buf, 3790M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAN
80879 root             1 102    0 42708K 36228K CPU5    5 142.3H  96.29% python
78992 root             1  86    0  1043M 13328K CPU2    2   0:07  89.53% pfctl
15278 root             1  87    0  1033M  3168K CPU3    3   0:08  88.53% arp
11329 root             1  87    0  1033M  3192K CPU6    6   0:08  88.08% arp
99735 root             1  75    0  1033M  3252K CPU1    1   0:01  52.15% ifconf
 8951 zabbix           1  23    0 18968K  6784K CPU0    0  11:18   7.75% zabbix
59793 zabbix           1  23    0 18968K  6736K accept  4  11:21   7.45% zabbix
84320 zabbix           1  23    0 18968K  6560K CPU4    4  11:25   6.78% zabbix
49439 root             1  52    0 56608K 37396K piperd  2   0:04   2.06% php-cg
 4742 root             1  21    0  1033M  2332K bpf     3 244:17   1.59% filter
51653 root             1  20    0  1033M  2040K select  0 114:12   0.84% syslog
88844 unbound          8  20    0   109M 65520K kqread  1   0:13   0.75% unboun
59342 dhcpd            1  20    0 39112K 31280K select  2   0:08   0.26% dhcpd
51807 root             1  20    0  1034M  3232K CPU7    7   0:02   0.21% top
 3556 _flowd           1  20    0  6300K  2496K select  2  13:40   0.10% flowd
62732 nobody           1  20    0  1033M  1548K sbwait  2  13:18   0.08% sampli
36904 root             1  20    0  1039M  5724K select  7   0:52   0.05% ntpd

Title: Re: Python CPU running at 100
Post by: fabian on June 26, 2019, 06:52:34 am
very likely flowd.
Title: Re: Python CPU running at 100
Post by: amitis5 on June 26, 2019, 07:02:39 am
Thanks so much, that eliminated the python load.  What is the pfctl process?  Should I worry about it showing in the 90s?
Title: Re: Python CPU running at 100
Post by: fabian on June 26, 2019, 08:07:04 pm
pfctl is the command line utility which is used to access the firewall (communicate with the FreeBSD kernel).
Title: Re: Python CPU running at 100
Post by: franco on June 26, 2019, 10:26:43 pm
How much traffic are you pushing over this hardware?


Cheers,
Franco
Title: Re: Python CPU running at 100
Post by: amitis5 on June 26, 2019, 11:58:56 pm
We have a 5gig pipe, with 10gig coming in next week. 300APs, we're averaging about 1000 users. 

I've ordered two more servers to put out there with CARP for some redundancy.  I've not set this up before, so working on it my lab. 
Title: Re: Python CPU running at 100
Post by: amitis5 on June 30, 2019, 08:14:34 am
Hi Franco,

Thanks for your help the other day, however, I'm seeing huge loads again without flowd running.  Here is my top -S-P output:

last pid: 66540;  load averages:  6.13,  6.45,  6.95    up 3+12:34:06  01:09:26
78 processes:  9 running, 68 sleeping, 1 waiting
CPU 0: 26.3% user,  0.0% nice, 37.3% system,  4.7% interrupt, 31.8% idle
CPU 1: 20.8% user,  0.0% nice, 32.2% system, 15.7% interrupt, 31.4% idle
CPU 2: 30.6% user,  0.0% nice, 36.9% system,  3.9% interrupt, 28.6% idle
CPU 3: 21.2% user,  0.0% nice, 40.0% system,  7.5% interrupt, 31.4% idle
CPU 4: 25.9% user,  0.0% nice, 36.1% system,  9.4% interrupt, 28.6% idle
CPU 5: 30.6% user,  0.0% nice, 29.8% system, 12.2% interrupt, 27.5% idle
CPU 6: 32.9% user,  0.0% nice, 38.4% system,  4.3% interrupt, 24.3% idle
CPU 7: 36.1% user,  0.0% nice, 45.1% system,  0.4% interrupt, 18.4% idle
Mem: 239M Active, 6172M Inact, 76M Laundry, 1203M Wired, 774M Buf, 107M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAN
   11 root             8 155 ki31     0K   128K RUN     0 458.9H 214.94% idle
21586 root             1 102    0 52952K 45364K CPU0    0  80.8H  96.92% python
 5261 root             1  85    0  1043M 13244K CPU5    5   0:06  92.42% pfctl
47030 root             1  52    0  1045M 16488K RUN     6   0:05  86.74% pfctl
   12 root            92 -56    -     0K  1472K WAIT   -1  46.1H  59.14% intr
64375 root             1  21    0 64492K 46044K piperd  5   0:30   8.67% php-cg
 7108 root             1  21    0 60780K 42820K piperd  0   0:20   8.14% php-cg
30628 root             1  52    0 53608K 41528K select  3   0:19   4.65% php-cg
76753 root             1  52    0 58732K 40972K piperd  5   0:01   3.89% php-cg
   17 root             1 -16    -     0K    16K -       3  33:11   2.69% ran

Any other ideas?  We are running a lot of VLANs and see about 1000 concurrent users.  This has been the case for a while, but just seeing these processors load and slowness in the gui for the last couple of weeks.  DHCP service is continuing to crash and won't always restart.  I'm stumped.  Any further ideas from you are greatly appreciated. 

Thanks again for all you do for us.
Title: Re: Python CPU running at 100
Post by: deekdeeker on August 31, 2019, 02:12:45 am
I'm seeing the same for me just updated 2 units and the cpu utilization seems much more spiky and erratic. in all cases it does seem to be that python3.7 seems peg the processor at random intervals for anywhere from 10 - 60 seconds in one case it was a few minutes. I upgraded from 19.1.10 --> 19.7 ->19.3

Im still monitoring and will report back any more info i have another 5 devices i need to update but am holding off until in know this isn't going to be a performance issue.
Title: Re: Python CPU running at 100
Post by: deekdeeker on August 31, 2019, 02:35:23 am
I just cleared my RRD graphs and netflow data and that seems to have resolved the issue so far. CPU seems back to normal... but still monitoring.