OPNsense Forum

Archive => 18.7 Legacy Series => Topic started by: Julien on July 21, 2018, 10:44:15 pm

Title: [SOLVED] CPU99%
Post by: Julien on July 21, 2018, 10:44:15 pm
Dear All,
I have a hardware box which is continue running on 90%/99% CPU which cause alot of pakket los on the WAN side.
I have checked the IO Operation
   
Code: [Select]
                 /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   ||||

          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
cpu  user|XXX
     nice|
   system|XX
interrupt|
     idle|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
ada0  MB/s
      tps|XXXX
pass0 MB/s
      tps|

and the interupt CPU usages

 
Code: [Select]
PID USERNAME    PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   11 root        155 ki31     0K    32K RUN     1 158:26  91.77% [idle{idle: cpu1}]
   11 root        155 ki31     0K    32K RUN     0 160:42  79.55% [idle{idle: cpu0}]
10682 root         21    0   112M 22368K accept  1   0:04   2.05% /usr/local/bin/php-cgi
   12 root        -60    -     0K   400K CPU0    0   2:17   0.48% [intr{swi4: clock (0)}]
   12 root        -92    -     0K   400K WAIT    0   0:29   0.24% [intr{irq259: em1:rx0}]
88373 root         20    0 20076K  3804K CPU1    1   0:00   0.22% top -aSH
21352 root         20    0 49640K  8628K kqread  1   0:43   0.12% /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf
  423 root         52    0   132M 32328K accept  0   0:20   0.08% /usr/local/bin/python2.7 /usr/local/opnsense/service/configd.py console{python2.7}
46510 root         20    0  1061M  6584K select  1   0:01   0.08% /usr/local/sbin/openvpn --config /var/etc/openvpn/client2.conf
   12 root        -92    -     0K   400K WAIT    0   0:05   0.05% [intr{irq262: em2:rx0}]
18706 root         20    0  1049M  2760K select  0   0:04   0.03% /usr/local/sbin/apinger -c /var/etc/apinger.conf
62443 root         20    0  1091M  6864K select  1   0:00   0.03% sshd: root@pts/0 (sshd)
   16 root        -16    -     0K    16K pftm    0   0:05   0.03% [pf purge]
   12 root        -92    -     0K   400K WAIT    1   0:03   0.03% [intr{irq260: em1:tx0}]
   17 root        -16    -     0K    16K -       1   0:03   0.01% [rand_harvestq]
  968 root         29    0 97112K 22380K select  1  46:15   0.01% /usr/local/bin/python2.7 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py
    4 root        -16    -     0K    32K -       1   0:05   0.01% [cam{doneq0}]
 4692 root         20    0  1051M  3028K select  0   0:03   0.01% /usr/local/sbin/syslogd -s -c -c -P /var/run/syslog.pid -l /var/dhcpd/var/run/log -f /var/
85922 root         20    0  1051M  6124K select  0   0:02   0.01% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid{ntpd}
   12 root        -92    -     0K   400K WAIT    1   0:01   0.01% [intr{irq263: em2:tx0}]
29963 dhcpd        20    0 24732K  8788K select  1   0:01   0.01% /usr/local/sbin/dhcpd -user dhcpd -group dhcpd -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf
   12 root        -88    -     0K   400K WAIT    0   0:03   0.01% [intr{irq19: ahci0}]
    0 root         -4    -     0K   320K -       0   0:02   0.01% [kernel{/ trim}]
   18 root        -16    -     0K    48K psleep  1   0:00   0.00% [pagedaemon{pagedaemon}]
94464 root         20    0 38816K  5740K kqread  1   0:01   0.00% /usr/local/sbin/lighttpd -f /var/etc/lighttpd-acme-challenge.conf
14975 root         20    0  1053M  2824K bpf     1   0:01   0.00% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
42547 root         20    0  1061M  6592K select  1   0:00   0.00% /usr/loc


The firewall is running just simple 2 firewall rules, any to any on the LAN and OPENVPN on the WAN nothing else.
i just wanna make sure we are not dealing with a faulty hardware.

Can someone please point me to the right directions to check ?

Thank you
Title: Re: CPU99%
Post by: Animosity022 on July 22, 2018, 12:15:25 am
Did you cut off part of the screen? The system looks 99% idle.
Title: Re: CPU99%
Post by: Julien on July 22, 2018, 11:12:46 am
Did you cut off part of the screen? The system looks 99% idle.
Thank you for your answer. but i am not sure i understand what you mean ?
which screen has been cut ?
Title: Re: CPU99%
Post by: phoenix on July 22, 2018, 01:55:40 pm
Did you cut off part of the screen? The system looks 99% idle.
Thank you for your answer. but i am not sure i understand what you mean ?
which screen has been cut ?
The top two lines in your second 'capture' show the following:

Quote
PID USERNAME    PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   11 root        155 ki31     0K    32K RUN     1 158:26  91.77% [idle{idle: cpu1}]
   11 root        155 ki31     0K    32K RUN     0 160:42  79.55% [idle{idle: cpu0}]
Which, in my limited experience of FreeBSD, would seem to indicate that cpu1 @ 91.77% idle and cpu0 @ 79.55% idle. Is that what it's showing and why do you think that your 'hardware box' is running at 90-99% cpu usage?
Title: Re: CPU99%
Post by: Julien on July 22, 2018, 02:15:27 pm
Did you cut off part of the screen? The system looks 99% idle.
Thank you for your answer. but i am not sure i understand what you mean ?
which screen has been cut ?
The top two lines in your second 'capture' show the following:

Quote
PID USERNAME    PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   11 root        155 ki31     0K    32K RUN     1 158:26  91.77% [idle{idle: cpu1}]
   11 root        155 ki31     0K    32K RUN     0 160:42  79.55% [idle{idle: cpu0}]
Which, in my limited experience of FreeBSD, would seem to indicate that cpu1 @ 91.77% idle and cpu0 @ 79.55% idle. Is that what it's showing and why do you think that you 'hardware box' is running at 90-99% cpu usage?
Hi bill,
is just the timing not right for the capture. it shows iddle 0.1% and jumps again back.
on the gui it shows the CPU running like 99%

because it so busy its causes alot of pakket drop on the wan side and i need to indetify what causes this.

when we ping the ISP IP it does shows times out

Code: [Select]
64 bytes from 66.88.99.0: icmp_seq=0 ttl=64 time=3.902 ms
64 bytes from 66.88.99.0: icmp_seq=1 ttl=64 time=1.032 ms
64 bytes from 66.88.99.0: icmp_seq=2 ttl=64 time=1.369 ms
Request timed out.
64 bytes from 66.88.99.0: icmp_seq=3 ttl=64 time=1.187 ms
64 bytes from 66.88.99.0: icmp_seq=4 ttl=64 time=1.123 ms
64 bytes from 66.88.99.0: icmp_seq=5 ttl=64 time=1.335 ms
Request timed out.
64 bytes from 66.88.99.0: icmp_seq=6 ttl=64 time=1.099 ms
64 bytes from 66.88.99.0: icmp_seq=7 ttl=64 time=2.227 ms
64 bytes from 66.88.99.0: icmp_seq=8 ttl=64 time=1.191 ms
64 bytes from 66.88.99.0: icmp_seq=9 ttl=64 time=1.060 ms

we just need to know where to look, ISP or the firewall.

thank you
Title: Re: CPU99%
Post by: Animosity022 on July 22, 2018, 03:25:52 pm
Your system looks idle based on what you've shared.

You should see something above that you cut off when you ran your top command:

Code: [Select]
last pid: 52990;  load averages:  0.29,  0.20,  0.13                                                                                    up 6+21:42:25  09:25:27
49 processes:  1 running, 48 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.1% interrupt, 99.9% idle
Mem: 24M Active, 3645M Inact, 729M Wired, 437M Buf, 3462M Free
Swap: 8192M Total, 8192M Free
Title: Re: CPU99%
Post by: Julien on July 22, 2018, 04:07:53 pm
Your system looks idle based on what you've shared.

You should see something above that you cut off when you ran your top command:

Code: [Select]
last pid: 52990;  load averages:  0.29,  0.20,  0.13                                                                                    up 6+21:42:25  09:25:27
49 processes:  1 running, 48 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.1% interrupt, 99.9% idle
Mem: 24M Active, 3645M Inact, 729M Wired, 437M Buf, 3462M Free
Swap: 8192M Total, 8192M Free
Thank you for your answer.
Do you maybe know what cause the drop of the ISP gateway ?
Title: Re: CPU99%
Post by: Davesworld on July 22, 2018, 08:47:53 pm
More info such as what type of connection, eg bridged modem etc.

As an example, I have a dual wan setup with two dsl modems in bridge mode, one goes to the incumbent carrier and the second a reseller of the incumbent carrier's service. The incumbent uses ADSL2+ whereas the reseller is G.dmt, both fast path. The incumbent line does exactly what you describe and when I log into that modem( I documented how to set up rules in tutorials to do this), I see the snr margin has dropped considerably but this never lasts for more than a minute, this happens only several times a week too. The point of this is that it is in no way the fault of OPNsense. Your system appears to be happily gliding along as far as load.
Title: Re: CPU99%
Post by: miroco on July 22, 2018, 09:53:00 pm
Could perhaps the dot-zero IP-address (66.88.99.0) be the culprit?

https://labs.ripe.net/Members/stephane_bortzmeyer/all-ip-addresses-are-equal-dot-zero-addresses-are-less-equal
Title: Re: CPU99%
Post by: Davesworld on July 22, 2018, 11:00:45 pm
Usally a dot zero represents an entire subnet eg 66.88.99.0/length
Title: Re: CPU99%
Post by: ruffy91 on July 23, 2018, 05:43:44 am
Usally a dot zero represents an entire subnet eg 66.88.99.0/length
Only if the subnet size is /24 or smaller.
Title: Re: CPU99%
Post by: Julien on July 29, 2018, 10:29:27 pm
Thank you guys, we got this fixed.
it was a ISP issue and they have it fixed already.
thank you for your supports.
Title: Re: [SOLVED] CPU99%
Post by: amitis5 on June 30, 2019, 08:33:50 am
Sorry for kicking up an old thread, but I was wondering if the ISP told you what the problem was?  I'm having these same symptoms and I can't seem to pin them down.

Thanks,