First of all, sorry if this is a simple issue, I am a newbie with OPNsense and FreeBSD.
I tried some solutions I could find on this forum:
1. reboot
2. deactivate Reporting -> Netflow -> Capture local
3. repair Netlow data
All these to no avail.
Web access to OPNsense dashboard and other is extremely slow.
Running top shows me an inordinate usage of CPU by php!
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12779 root 1 85 0 108M 80M CPU2 2 0:29 79.26% php-cgi
80884 root 1 94 0 147M 119M RUN 3 0:41 63.55% php-cgi
48574 root 1 82 0 106M 86M RUN 1 0:05 44.93% php
20355 root 1 82 0 106M 86M CPU1 1 0:05 39.04% php
70423 root 1 85 0 113M 85M RUN 0 1:12 31.66% php-cgi
7189 root 1 31 0 103M 76M CPU3 3 1:19 14.41% php-cgi
11907 root 7 20 0 110M 80M nanslp 2 42:27 14.19% suricata
13796 netdata 21 52 19 139M 91M pause 0 4:20 1.51% netdata
61383 root 3 52 0 280M 228M accept 1 3:13 1.29% python3.8
26015 netdata 2 39 19 52M 32M select 3 1:49 0.39% python3.8
10788 root 1 20 0 1038M 4284K CPU0 0 0:00 0.13% top
57233 root 1 20 0 18M 8144K kqread 3 0:06 0.09% lighttpd
27323 netdata 1 39 19 18M 7024K nanslp 1 0:16 0.05% apps.plugin
54279 root 1 20 0 23M 14M select 2 0:17 0.05% python3.8
44278 root 1 20 0 18M 6372K select 2 0:09 0.03% ntpd
13176 root 3 20 0 30M 10M kqread 2 0:03 0.03% syslog-ng
51196 root 1 20 0 21M 11M select 2 0:06 0.02% python3.8
63155 root 1 20 0 21M 11M select 2 0:06 0.02% python3.8
32033 root 1 20 0 17M 7908K select 0 0:00 0.02% sshd
73352 root 1 20 0 11M 2704K select 0 0:01 0.01% syslogd
...
Tried a few more things since first posting this:
1. deleted all widgets from the portal - result in slightly lower php usage
2. Reset RRD
3. reset Netflow data
All to no avail again!
The problem is very likely to be a php issue. Unfortunately my knowledge of php is close to nil. :(
Can this be fixed and how, or do I need a re-install?
Thanks for any help.
If any other info is required, let me know.
I have kept on trying various fixes. Mostly stabs in the dark and always rebooting after a change to make sure.
Now, disabling suricata and rebooting gives me a different result. Now the issue is with php still but also with python3.8.
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
93888 root 1 90 0 110M 82M RUN 0 0:14 75.59% php
35258 root 1 88 0 319M 308M CPU3 3 0:12 64.56% python3.8
39001 root 1 87 0 116M 82M CPU1 1 0:12 59.04% php
78509 root 1 87 0 319M 308M RUN 2 0:15 55.70% python3.8
81797 root 1 87 0 407M 388M CPU0 0 0:37 50.51% python3.8
82987 root 1 88 0 124M 95M RUN 3 0:51 46.82% php-cgi
51231 root 1 88 0 319M 308M RUN 2 0:17 43.79% python3.8
...
Unfortunately, not being immortal, and having less time to live than most on this forum, I may have to abandon OPNsense. I really cannot afford to spend days trying to fix a network issue that is not productive of my time. This is my perimeter router and it need be stable.
Again, here is a full top report:
last pid: 6197; load averages: 6.48, 6.59, 6.60 up 0+02:13:58 15:51:39
59 processes: 8 running, 50 sleeping, 1 zombie
CPU: 98.9% user, 0.0% nice, 0.8% system, 0.3% interrupt, 0.0% idle
Mem: 769M Active, 2835M Inact, 671M Wired, 468M Buf, 3547M Free
Swap: 8192M Total, 8192M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
83182 root 1 83 0 106M 78M RUN 2 0:05 75.80% php
90397 root 1 83 0 106M 78M RUN 1 0:05 66.13% php
2200 root 1 86 0 110M 83M CPU1 1 0:12 65.95% php
92416 root 1 88 0 110M 82M CPU3 3 0:13 57.32% php
47124 root 1 84 0 108M 80M RUN 0 0:06 49.83% php
81956 root 1 85 0 112M 84M CPU0 0 0:14 48.24% php
3118 root 1 87 0 110M 81M RUN 2 0:11 32.96% php
13941 netdata 21 52 19 117M 69M pause 3 1:04 0.91% netdata
38210 netdata 2 39 19 52M 32M select 2 0:34 0.38% python3.8
94140 root 1 20 0 1044M 4140K CPU2 2 0:00 0.12% top
16116 netdata 1 39 19 18M 7336K nanslp 3 0:05 0.07% apps.plugin
31008 root 1 20 0 23M 14M select 2 0:09 0.05% python3.8
67388 root 1 20 0 21M 11M select 2 0:03 0.04% python3.8
86549 root 1 20 0 18M 6496K select 2 0:03 0.03% ntpd
87080 root 1 20 0 21M 11M select 1 0:03 0.03% python3.8
71351 root 1 20 0 17M 7304K select 1 0:00 0.02% sshd
85286 root 1 20 0 25M 16M select 1 2:10 0.02% python3.8
59047 root 1 20 0 18M 8136K kqread 2 0:07 0.01% lighttpd
38853 root 1 20 0 11M 2712K select 3 0:01 0.01% syslogd
68352 root 1 20 0 96M 67M select 2 0:54 0.01% php-cgi
28372 root 1 20 0 91M 61M select 0 0:01 0.01% php-cgi
65199 root 2 26 0 19M 7296K nanslp 2 0:00 0.01% monit
18862 root 1 22 0 96M 68M select 2 0:59 0.00% php-cgi
64243 root 1 20 0 104M 75M select 0 1:08 0.00% php-cgi
79915 root 1 20 0 12M 2548K bpf 3 0:00 0.00% filterlog
50443 root 1 20 0 103M 75M select 0 1:03 0.00% php-cgi
17156 root 1 20 0 91M 62M select 0 0:06 0.00% php-cgi
7659 root 7 52 0 279M 220M accept 3 3:26 0.00% python3.8
88361 clamav 2 20 0 1233M 1169M select 3 1:16 0.00% clamd
77511 root 1 52 0 31M 19M wait 2 0:05 0.00% python3.8
7843 root 1 52 0 1043M 3564K wait 0 0:04 0.00% sh
This shows a php issue. This php is at the core of the OPNsense software, no? It is not simply the php for the web server.
What can I do to progress this? I am a bit concerned about simply re-installing as it takes time to do so and my whole network will be completely down for that period.
Is it possible to downgrade from the CLI or from the web portal?
Any other suggestions?
If it is of any use here are the detail of my hardware:
Platform
Manufacturer Protectli
Product Name FW4B
Version Ver 1.3
Serial Number Default string
Family Default string
BIOS
Vendor American Megatrends Inc.
Version 5.11
Release Date 10/22/2019
It is a Protectli FW4B – 4 Port Intel® J3160
Intel Celeron® J3160 Quad Core at 1.6 GHz (Burst to 2.24 GHz)
with 4 Intel® Gigabit Ethernet NIC ports
and AES-NI support
Thanks for any help or advice
hi
any clue in system log?
can you share "top report" from System: Diagnostics: Activity? (top -an)
Quote from: Fright on September 26, 2021, 09:06:31 PM
hi
any clue in system log?
can you share "top report" from System: Diagnostics: Activity? (top -an)
Thanks for trying to help.
Here is the "top -an" result:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
27631 root 1 88 0 112M 91M CPU2 2 0:15 53.56% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
33574 root 1 87 0 110M 89M CPU1 1 0:09 52.59% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
7197 root 1 88 0 127M 111M RUN 1 0:17 51.95% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
80339 root 1 87 0 108M 87M RUN 0 0:08 50.00% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
35262 root 1 83 0 108M 86M CPU3 3 0:05 37.70% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
55901 root 1 82 0 106M 85M RUN 2 0:05 35.50% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
34198 root 1 23 0 103M 75M select 2 0:31 4.39% /usr/local/bin/php-cgi
72367 root 1 21 0 98M 73M select 0 1:23 1.76% /usr/local/bin/php-cgi
35815 root 1 20 0 98M 72M select 2 0:07 1.76% /usr/local/bin/php-cgi
16564 root 1 20 0 93M 66M select 0 0:59 0.68% /usr/local/bin/php-cgi
13941 netdata 21 52 19 140M 92M pause 3 7:58 0.59% /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
85286 root 1 20 0 25M 16M select 3 86:31 0.20% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)
62531 root 1 20 0 103M 75M select 0 0:39H 0.20% /usr/local/bin/php-cgi
38210 netdata 2 39 19 52M 32M select 3 3:43 0.10% /usr/local/bin/python3.8 /usr/local/libexec/netdata/plugins.d/python.d.plugin 1
7659 root 7 52 0 288M 227M accept 2 5:57 0.00% /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.8)
88361 clamav 2 20 0 1238M 1173M select 0 2:34 0.00% /usr/local/sbin/clamd
95860 root 1 20 0 91M 65M select 2 0:46 0.00% /usr/local/bin/php-cgi
16116 netdata 1 39 19 18M 7356K nanslp 1 0:34 0.00% /usr/local/libexec/netdata/plugins.d/apps.plugin 1
None of the logs under system show anything that may indicate the source of a problem. I get the expected warnings about excessive CPU usage but that is it.
Changing pages in the web gui take a minute or more.
Loading the logs in the gui takes more than a minute.
ssh into the system gets me the welcome message immediately but after that I have to wait a minute or more to get to the menu.
I had suricata on, but disabled it, trying to fix the issue. It is still currently disabled.
It has now been three days and my router is hot although the sensors show a temperature (50-60 C) still.
Thanks again for taking interest in this.
Any strange vlan setup?
Hello Allebone,
I had a second LAN interface setup for DMZ but had nothing at all plugged in/connected to it for a few weeks. To be sure, I disabled the interface completely just now without any effect. Still 100% CPU usage.
Thank you for your attention to this.
OK! I think I know a bit more about what is causing the issue.
Since I have been trying to solve this I always had a window in my web browser open to the OPNsense GUI.
As they were php-cgi processes with high cpu usage, I decided to get rid of those to close that tab and watch via ssh top what was happening. THE ISSUE DISAPPEARED! No processes running with more than 0.09% cpu usage, and the highest one was top!
So, now the problem has to do with the web server of OPNsense somehow. The only thing I can think of is that it may be related to my monitor settings. It is a 2K monitor that I am using to access the GUI of OPNsense. This was not an issue until the last upgrade!
Is that helping ?
Meanwhile, I guess, I can simply keep the tab monitoring the router closed until there is a fix. Hopefully not too long to wait as as soon as I open the GUI again, the cpu usage goes through the roof and stays there for as long as the window is open. I would really like to keep on running OPNsense rather than having to go to alternatives. :)
If the devs need any more information from me, simply ask. I will monitor this topic regularly over the next couple of days.
Here is the output of top with the GUI tab closed:
63154 root 1 20 0 1044M 4044K CPU3 3 0:01 0.09% top
94184 root 1 20 0 23M 14M select 3 0:06 0.03% python3.8
20949 root 1 20 0 21M 11M select 3 0:01 0.01% python3.8
59998 root 1 20 0 12M 2548K bpf 0 0:00 0.01% filterlog
97220 root 1 20 0 17M 7028K select 1 0:00 0.01% sshd
89394 root 3 20 0 28M 10M CPU1 1 0:00 0.01% syslog-ng
13444 root 1 20 0 21M 11M select 2 0:01 0.01% python3.8
22839 root 1 20 0 26M 16M select 0 0:14 0.01% python3.8
40855 root 1 20 0 18M 6492K select 3 0:00 0.01% ntpd
4118 root 1 20 0 18M 8012K kqread 2 0:01 0.00% lighttpd
63083 root 2 26 0 19M 7396K nanslp 1 0:00 0.00% monit
58792 root 1 52 0 99M 79M accept 1 2:43 0.00% php-cgi
52477 root 1 52 0 249M 203M accept 0 2:24 0.00% python3.8
28141 root 1 20 0 74M 55M accept 3 2:13 0.00% php-cgi
99605 root 1 20 0 71M 53M accept 2 1:55 0.00% php-cgi
12970 root 1 52 0 107M 86M accept 3 1:34 0.00% php-cgi
5931 clamav 2 20 0 1233M 1170M select 0 1:17 0.00% clamd
85673 root 1 20 0 93M 64M accept 0 1:15 0.00% php-cgi
37864 root 1 20 0 91M 63M accept 2 0:45 0.00% php-cgi
43307 root 1 52 0 31M 19M wait 3 0:05 0.00% python3.8
53063 root 1 52 0 1037M 3328K wait 2 0:01 0.00% sh
92696 unbound 4 20 0 70M 38M kqread 2 0:00 0.00% unbound
61424 root 1 20 0 21M 6812K select 0 0:00 0.00% mpd5
57269 root 1 4 0 11M 2708K CPU1 1 0:00 0.00% syslogd
81032 root 1 52 0 43M 20M wait 3 0:00 0.00% php-cgi
49170 root 1 52 0 43M 20M wait 0 0:00 0.00% php-cgi
47350 root 1 33 0 1036M 3256K nanslp 2 0:00 0.00% cron
82552 root 1 20 0 1044M 4644K pause 0 0:00 0.00% csh
15995 dhcpd 1 20 0 23M 9628K select 0 0:00 0.00% dhcpd
59269 root 1 20 0 10M 1428K select 2 0:00 0.00% devd
97240 nobody 1 20 0 10M 2080K sbwait 3 0:00 0.00% samplicate
hm. are there any anomalies in Traffic Graph widget (if using) or on Reporting: Traffic page? may be something in System: Log Files: Backend?
No, no anomalies at all, except a very slow refresh rate.
I, now, did put on the Dashboard, all widgets. Then I deleted them one by one , saving the setting after each and waiting the 4 or 5 minutes to get each completed, trying to identify the culprit. Guess what, even with no widgets left on the dashboard, I still get at least one php-cgi process that goes to the 100% CPU.
Strangely, when I go to the Licence page, I get none. When I exit the GUI entirely, everything is normal, no high cpu usage. Unfortunately, having to wait 4 or 5 minutes to change page on the web GUI does not suit my use case.
If there is no solution by tomorrow morning my time, it will have been 4 days with this issue, I will have to pull the plug on OPNsense. I really cannot afford the time and I have not got the time to learn the whole framework to try to identify and fix the problem. I may, very regretfully, have to go to a paid and closed source solution. :(
I also had this before when I setup bridging incorrectly. Did you have any bonding or bridging at all?
None currently.
Only a router as WI-FI access point working with a different sub-net which has no issues.
Can you check IPv6 is not causing an issue by disabling it entirely:
https://www.thomas-krenn.com/en/wiki/OPNsense_disable_IPv6
IPv6 has been disabled since very early after the install and still is totally disabled. I find internal IP4 much easier to manage as my labs have very frequent changes of hardware and configuration.
Then I am stumped :(
So am I. :( :)
So I now have to decide whether to re-install or go to an alternative like pfSense, IPFire or OpenWRT. They all have their pluses and minuses.
To re-install without having a clue about the issue seems a bit pointless.
Shrug!
Anyway thanks for your attention and time given to this. I really appreciated it.
27631 root 1 88 0 112M 91M CPU2 2 0:15 53.56% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
33574 root 1 87 0 110M 89M CPU1 1 0:09 52.59% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
7197 root 1 88 0 127M 111M RUN 1 0:17 51.95% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
80339 root 1 87 0 108M 87M RUN 0 0:08 50.00% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
35262 root 1 83 0 108M 86M CPU3 3 0:05 37.70% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
55901 root 1 82 0 106M 85M RUN 2 0:05 35.50% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
It sure looks odd. Nothing in the code that would suddenly cause this so reinstall might solve it or not- worst case your hardware is slowing things down which would transfer to any other new installation.
Make sure your health audit comes up empty. If that is the case trying booting a live image with an early config import and see if that behaves better. It might also be the disk or SD card depending on what is installed there.
Cheers,
Franco
My system is near to crash - did the upgrade this morning to 21.7.3_1
CPU most of the time 100%, Memory increasing and increasing ..
Think it's a phython 3.8 issue together with syslog-ng ?!
last pid: 13263; load averages: 3.97, 3.91, 3.46 up 9+13:55:29 18:08:53
53 processes: 3 running, 48 sleeping, 2 zombie
CPU: 54.5% user, 0.0% nice, 40.1% system, 1.1% interrupt, 4.3% idle
Mem: 851M Active, 503M Inact, 1384M Laundry, 1047M Wired, 392M Buf, 123M Free
Swap: 5120M Total, 3212M Used, 1908M Free, 62% Inuse, 128K In, 6016K Out
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
74757 root 6 28 0 33M 4388K kqread 0 91:32 93.87% syslog-ng
90861 root 1 101 0 11M 1564K RUN 2 87:22 87.99% syslogd
15412 root 1 101 0 4261M 2115M CPU3 3 842:39 85.44% python3.8
Name opnadm9.opn9.9opn
Versions OPNsense 21.7.3_1-amd64
FreeBSD 12.1-RELEASE-p20-HBSD
OpenSSL 1.1.1l 24 Aug 2021
Updates Click to check for updates.
CPU type AMD GX-412TC SOC (4 cores)
CPU usage
Load average 3.69, 3.72, 3.33
Uptime 9 days 13:53:29
Current date/time Tue Sep 28 18:06:53 CEST 2021
Last config change Tue Sep 28 11:16:26 CEST 2021
CPU usage
100 %
State table size
0 % ( 890/403000 )
MBUF usage
0 % ( 1806/250690 )
Memory usage
88 % ( 3563/4035 MB )
SWAP usage
59 % ( 3041/5120 MB )
Disk usage
75% / [ufs] (9.3G/13G)
@devhunter55
imho your issue is related with https://forum.opnsense.org/index.php?topic=24868.0 not this one
try to reboot opnsense or just kill '/usr/local/bin/python3 /usr/local/opnsense/service/configd_ctl.py -e -t 0.5 system event config_changed' instances and clear system log
yes, thanks .. i already mentioned that in the other board :D