Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - MidGe

#1
So am I.  :(  :)

So I now have to decide whether to re-install or go to an alternative like pfSense, IPFire or OpenWRT. They all have their pluses and minuses.

To re-install without having a clue about the issue seems a bit pointless.

Shrug!

Anyway thanks for your attention and time given to this. I really appreciated it.
#2
IPv6 has been disabled since very early after the install and still is totally disabled. I find internal IP4 much easier to manage as my labs have very frequent changes of hardware and configuration.
#3
None currently.

Only a router as WI-FI access point working with a different sub-net which has no issues.
#4
No, no anomalies at all, except a very slow refresh rate.

I, now, did put on the Dashboard, all widgets. Then I deleted them one by one , saving the setting after each and waiting the 4 or 5 minutes to get each completed, trying to identify the culprit. Guess what, even with no widgets left on the dashboard, I still get at least one php-cgi process that goes to the 100% CPU.

Strangely, when I go to the Licence page, I get none. When I exit the GUI entirely, everything is normal, no high cpu usage.  Unfortunately, having to wait 4 or 5 minutes to change page on the web GUI does not suit my use case.

If there is no solution by tomorrow morning my time, it will have been 4 days with this issue, I will have to pull the plug on OPNsense. I really cannot afford the time and I have not got the time to learn the whole framework to try to identify and fix the problem. I may, very regretfully,  have to go to a paid and  closed source solution.  :(





#5
OK! I think I know a bit more about what is causing the issue.

Since I have been trying to solve this I always had a window in my web browser open to the OPNsense GUI.

As they were php-cgi processes with high cpu usage, I decided to get rid of those to close that tab and watch via ssh top what was happening. THE ISSUE DISAPPEARED! No processes running with more than 0.09% cpu usage, and the highest one was top!

So, now the problem has to do with the web server of OPNsense somehow. The only thing I can think of is that it may be related to my monitor settings. It is a 2K monitor that I am using to access the GUI of OPNsense. This was not an issue until the last upgrade!

Is that helping ?

Meanwhile, I guess, I can simply keep the tab monitoring the router closed until there is a fix. Hopefully not too long to wait as as soon as I open the GUI again, the cpu usage goes through the roof and stays there for as long as the window is open. I would really like to keep on running OPNsense rather than having to go to alternatives. :)

If the devs need any more information from me, simply ask.  I will monitor this topic regularly over the next couple of days.

Here is the output of top with the GUI tab closed:

63154 root          1  20    0  1044M  4044K CPU3     3   0:01   0.09% top
94184 root          1  20    0    23M    14M select   3   0:06   0.03% python3.8
20949 root          1  20    0    21M    11M select   3   0:01   0.01% python3.8
59998 root          1  20    0    12M  2548K bpf      0   0:00   0.01% filterlog
97220 root          1  20    0    17M  7028K select   1   0:00   0.01% sshd
89394 root          3  20    0    28M    10M CPU1     1   0:00   0.01% syslog-ng
13444 root          1  20    0    21M    11M select   2   0:01   0.01% python3.8
22839 root          1  20    0    26M    16M select   0   0:14   0.01% python3.8
40855 root          1  20    0    18M  6492K select   3   0:00   0.01% ntpd
4118 root          1  20    0    18M  8012K kqread   2   0:01   0.00% lighttpd
63083 root          2  26    0    19M  7396K nanslp   1   0:00   0.00% monit
58792 root          1  52    0    99M    79M accept   1   2:43   0.00% php-cgi
52477 root          1  52    0   249M   203M accept   0   2:24   0.00% python3.8
28141 root          1  20    0    74M    55M accept   3   2:13   0.00% php-cgi
99605 root          1  20    0    71M    53M accept   2   1:55   0.00% php-cgi
12970 root          1  52    0   107M    86M accept   3   1:34   0.00% php-cgi
5931 clamav        2  20    0  1233M  1170M select   0   1:17   0.00% clamd
85673 root          1  20    0    93M    64M accept   0   1:15   0.00% php-cgi
37864 root          1  20    0    91M    63M accept   2   0:45   0.00% php-cgi
43307 root          1  52    0    31M    19M wait     3   0:05   0.00% python3.8
53063 root          1  52    0  1037M  3328K wait     2   0:01   0.00% sh
92696 unbound       4  20    0    70M    38M kqread   2   0:00   0.00% unbound
61424 root          1  20    0    21M  6812K select   0   0:00   0.00% mpd5
57269 root          1   4    0    11M  2708K CPU1     1   0:00   0.00% syslogd
81032 root          1  52    0    43M    20M wait     3   0:00   0.00% php-cgi
49170 root          1  52    0    43M    20M wait     0   0:00   0.00% php-cgi
47350 root          1  33    0  1036M  3256K nanslp   2   0:00   0.00% cron
82552 root          1  20    0  1044M  4644K pause    0   0:00   0.00% csh
15995 dhcpd         1  20    0    23M  9628K select   0   0:00   0.00% dhcpd
59269 root          1  20    0    10M  1428K select   2   0:00   0.00% devd
97240 nobody        1  20    0    10M  2080K sbwait   3   0:00   0.00% samplicate


#6
Hello Allebone,

I had a second LAN interface setup for DMZ but had nothing at all plugged in/connected to it for a few weeks. To be sure, I disabled the interface completely just now without any effect. Still 100% CPU usage.

Thank you for your attention to this.
#7
Quote from: Fright on September 26, 2021, 09:06:31 PM
hi
any clue in system log?
can you share "top report" from System: Diagnostics: Activity? (top -an)

Thanks for trying to help.

Here is the "top -an" result:


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
27631 root          1  88    0   112M    91M CPU2     2   0:15  53.56% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
33574 root          1  87    0   110M    89M CPU1     1   0:09  52.59% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
7197 root          1  88    0   127M   111M RUN      1   0:17  51.95% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
80339 root          1  87    0   108M    87M RUN      0   0:08  50.00% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
35262 root          1  83    0   108M    86M CPU3     3   0:05  37.70% /usr/local/bin/php /usr/local/opnsense/scripts/interfaces/traffic_stats.php
55901 root          1  82    0   106M    85M RUN      2   0:05  35.50% /usr/local/bin/php /usr/local/opnsense/scripts/routes/gateway_status.php
34198 root          1  23    0   103M    75M select   2   0:31   4.39% /usr/local/bin/php-cgi
72367 root          1  21    0    98M    73M select   0   1:23   1.76% /usr/local/bin/php-cgi
35815 root          1  20    0    98M    72M select   2   0:07   1.76% /usr/local/bin/php-cgi
16564 root          1  20    0    93M    66M select   0   0:59   0.68% /usr/local/bin/php-cgi
13941 netdata      21  52   19   140M    92M pause    3   7:58   0.59% /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
85286 root          1  20    0    25M    16M select   3  86:31   0.20% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)
62531 root          1  20    0   103M    75M select   0   0:39H   0.20% /usr/local/bin/php-cgi
38210 netdata       2  39   19    52M    32M select   3   3:43   0.10% /usr/local/bin/python3.8 /usr/local/libexec/netdata/plugins.d/python.d.plugin 1
7659 root          7  52    0   288M   227M accept   2   5:57   0.00% /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.8)
88361 clamav        2  20    0  1238M  1173M select   0   2:34   0.00% /usr/local/sbin/clamd
95860 root          1  20    0    91M    65M select   2   0:46   0.00% /usr/local/bin/php-cgi
16116 netdata       1  39   19    18M  7356K nanslp   1   0:34   0.00% /usr/local/libexec/netdata/plugins.d/apps.plugin 1


None of the logs under system show anything that may indicate the source of a problem. I get the expected warnings about excessive CPU usage but that is it.

Changing pages in the web gui take a minute or more.
Loading  the logs in the gui takes more than a minute.
ssh into the system gets me the welcome message immediately but after that I have to wait a minute or more to get to the menu.

I had suricata on, but disabled it, trying to fix the issue.  It is still currently disabled.

It has now been three days and my router is hot although the sensors show a  temperature (50-60 C) still.

Thanks again for taking interest in this.



#8
Again, here is a full top report:

last pid:  6197;  load averages:  6.48,  6.59,  6.60                                                                      up 0+02:13:58  15:51:39
59 processes:  8 running, 50 sleeping, 1 zombie
CPU: 98.9% user,  0.0% nice,  0.8% system,  0.3% interrupt,  0.0% idle
Mem: 769M Active, 2835M Inact, 671M Wired, 468M Buf, 3547M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
83182 root          1  83    0   106M    78M RUN      2   0:05  75.80% php
90397 root          1  83    0   106M    78M RUN      1   0:05  66.13% php
2200 root          1  86    0   110M    83M CPU1     1   0:12  65.95% php
92416 root          1  88    0   110M    82M CPU3     3   0:13  57.32% php
47124 root          1  84    0   108M    80M RUN      0   0:06  49.83% php
81956 root          1  85    0   112M    84M CPU0     0   0:14  48.24% php
3118 root          1  87    0   110M    81M RUN      2   0:11  32.96% php
13941 netdata      21  52   19   117M    69M pause    3   1:04   0.91% netdata
38210 netdata       2  39   19    52M    32M select   2   0:34   0.38% python3.8
94140 root          1  20    0  1044M  4140K CPU2     2   0:00   0.12% top
16116 netdata       1  39   19    18M  7336K nanslp   3   0:05   0.07% apps.plugin
31008 root          1  20    0    23M    14M select   2   0:09   0.05% python3.8
67388 root          1  20    0    21M    11M select   2   0:03   0.04% python3.8
86549 root          1  20    0    18M  6496K select   2   0:03   0.03% ntpd
87080 root          1  20    0    21M    11M select   1   0:03   0.03% python3.8
71351 root          1  20    0    17M  7304K select   1   0:00   0.02% sshd
85286 root          1  20    0    25M    16M select   1   2:10   0.02% python3.8
59047 root          1  20    0    18M  8136K kqread   2   0:07   0.01% lighttpd
38853 root          1  20    0    11M  2712K select   3   0:01   0.01% syslogd
68352 root          1  20    0    96M    67M select   2   0:54   0.01% php-cgi
28372 root          1  20    0    91M    61M select   0   0:01   0.01% php-cgi
65199 root          2  26    0    19M  7296K nanslp   2   0:00   0.01% monit
18862 root          1  22    0    96M    68M select   2   0:59   0.00% php-cgi
64243 root          1  20    0   104M    75M select   0   1:08   0.00% php-cgi
79915 root          1  20    0    12M  2548K bpf      3   0:00   0.00% filterlog
50443 root          1  20    0   103M    75M select   0   1:03   0.00% php-cgi
17156 root          1  20    0    91M    62M select   0   0:06   0.00% php-cgi
7659 root          7  52    0   279M   220M accept   3   3:26   0.00% python3.8
88361 clamav        2  20    0  1233M  1169M select   3   1:16   0.00% clamd
77511 root          1  52    0    31M    19M wait     2   0:05   0.00% python3.8
7843 root          1  52    0  1043M  3564K wait     0   0:04   0.00% sh


This shows a php issue. This php is at the core of the OPNsense software, no?  It is not simply the php for the web server.

What can I do to progress this? I am a bit concerned about simply re-installing as it takes time to do so and my whole network will be completely down for that period.

Is it possible to downgrade from the CLI or from the web portal?

Any other suggestions?

If it is of any use here are the detail of my hardware:

Platform
Manufacturer   Protectli
Product Name   FW4B
Version   Ver 1.3
Serial Number   Default string
Family   Default string
BIOS
Vendor   American Megatrends Inc.
Version   5.11
Release Date   10/22/2019

It is a Protectli FW4B – 4 Port Intel® J3160
Intel Celeron® J3160 Quad Core at 1.6 GHz (Burst to 2.24 GHz)
with 4 Intel® Gigabit Ethernet NIC ports
and AES-NI support

Thanks for any help or advice

#9
I have kept on trying various fixes.  Mostly stabs in the dark and always rebooting after a change to make sure.

Now, disabling suricata and rebooting gives me a different result. Now the issue is with php still but also with python3.8.


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
93888 root          1  90    0   110M    82M RUN      0   0:14  75.59% php
35258 root          1  88    0   319M   308M CPU3     3   0:12  64.56% python3.8
39001 root          1  87    0   116M    82M CPU1     1   0:12  59.04% php
78509 root          1  87    0   319M   308M RUN      2   0:15  55.70% python3.8
81797 root          1  87    0   407M   388M CPU0     0   0:37  50.51% python3.8
82987 root          1  88    0   124M    95M RUN      3   0:51  46.82% php-cgi
51231 root          1  88    0   319M   308M RUN      2   0:17  43.79% python3.8
...


Unfortunately, not being immortal, and having less time to live than most on this forum, I may have to abandon OPNsense. I really cannot afford to spend days trying to fix a network issue that is not productive of my time. This is my perimeter router and it need be stable.


#10
First of all, sorry if this is a simple issue, I am a newbie with OPNsense and FreeBSD.

I tried some  solutions I could find on this forum:
1. reboot
2. deactivate Reporting -> Netflow -> Capture local
3. repair Netlow data

All these to no avail.

Web access to OPNsense dashboard and other is extremely slow.


Running top shows me an inordinate usage of CPU by php!


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
12779 root          1  85    0   108M    80M CPU2     2   0:29  79.26% php-cgi
80884 root          1  94    0   147M   119M RUN      3   0:41  63.55% php-cgi
48574 root          1  82    0   106M    86M RUN      1   0:05  44.93% php
20355 root          1  82    0   106M    86M CPU1     1   0:05  39.04% php
70423 root          1  85    0   113M    85M RUN      0   1:12  31.66% php-cgi
7189 root          1  31    0   103M    76M CPU3     3   1:19  14.41% php-cgi
11907 root          7  20    0   110M    80M nanslp   2  42:27  14.19% suricata
13796 netdata      21  52   19   139M    91M pause    0   4:20   1.51% netdata
61383 root          3  52    0   280M   228M accept   1   3:13   1.29% python3.8
26015 netdata       2  39   19    52M    32M select   3   1:49   0.39% python3.8
10788 root          1  20    0  1038M  4284K CPU0     0   0:00   0.13% top
57233 root          1  20    0    18M  8144K kqread   3   0:06   0.09% lighttpd
27323 netdata       1  39   19    18M  7024K nanslp   1   0:16   0.05% apps.plugin
54279 root          1  20    0    23M    14M select   2   0:17   0.05% python3.8
44278 root          1  20    0    18M  6372K select   2   0:09   0.03% ntpd
13176 root          3  20    0    30M    10M kqread   2   0:03   0.03% syslog-ng
51196 root          1  20    0    21M    11M select   2   0:06   0.02% python3.8
63155 root          1  20    0    21M    11M select   2   0:06   0.02% python3.8
32033 root          1  20    0    17M  7908K select   0   0:00   0.02% sshd
73352 root          1  20    0    11M  2704K select   0   0:01   0.01% syslogd
...


Tried a few more things since first posting this:

1. deleted all widgets from the portal - result in slightly lower php usage
2. Reset RRD
3. reset Netflow data

All to no avail again!

The problem is very likely to be a php issue.  Unfortunately my knowledge of php is close to nil.  :(

Can this be fixed and how, or do I need a re-install?

Thanks for any help.

If any other info is required, let me know.
#11
21.7 Legacy Series / Re: 21.7.3. - high CPU and MEM usage
September 23, 2021, 02:29:10 AM
I had the same issue (cpu 100%) after the latest automatic update. A reboot fixed the problem.