I have a clean install of 22.1.r1-amd64 and while setting it up I noticed a weird single core CPU spike every minute or so. Inspecting `top`, I noticed that `ifconfig` on occasion is taking a lot of CPU (sometimes there are two instances of the process that spike). This setup hasn't been connected to a WAN yet (I've setup the PPPoE config for it, though), and I've only got two VLANs, no IDS, and simple firewall rules. Hardware is an Intel x64 CPU, NIC #1 - igb (onboard), NIC2 - ix (expansion, Intel X550). I've disabled monitoring (netflow) and there was no change.
Is this something simple/known or am I going to have to go deeper?
Some more data:
Versions OPNsense 22.1.r1-amd64
FreeBSD 13.0-STABLE
OpenSSL 1.1.1m 14 Dec 2021
Updates Click to check for updates.
CPU type Intel(R) Pentium(R) CPU G4400 @ 3.30GHz (2 cores)
CPU usage
Load average 1.06, 0.80, 0.64
Uptime 00:50:43
Current date/time Sat Jan 22 15:49:51 PST 2022
Last config change Sat Jan 22 15:44:52 PST 2022
CPU usage
0 %
State table size
0 % ( 2852/808000 )
MBUF usage
2 % ( 10160/501272 )
Memory usage
4 % ( 400/8087 MB )
SWAP usage
0 % ( 0/8191 MB )
Disk usage
2% / [ufs] (1.4G/64G)
Results of top when ifconfig spikes:
CPU: 0.0% user, 0.0% nice, 50.0% system, 0.0% interrupt, 50.0% idle
Mem: 114M Active, 22M Inact, 261M Wired, 93M Buf, 7421M Free
Swap: 8192M Total, 8192M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
49839 root 1 73 0 13M 2868K CPU1 1 0:01 101.01% ifconfig
48045 root 1 20 0 14M 3540K CPU0 0 0:00 0.07% top
90043 root 3 20 0 34M 10M kqread 1 0:01 0.04% syslog-n
75368 root 1 20 0 13M 2500K bpf 0 0:00 0.03% filterlo
75672 root 1 20 0 18M 7152K select 1 0:00 0.02% sshd
32810 unbound 2 20 0 149M 38M kqread 0 0:03 0.02% unbound
90760 root 1 20 0 23M 11M select 1 0:01 0.01% python3.
76457 root 2 20 0 21M 6396K select 0 0:00 0.01% ntpd
91230 root 1 20 0 23M 11M select 0 0:01 0.01% python3.
53155 root 1 20 0 12M 2096K select 0 0:01 0.00% powerd
408 root 1 52 0 56M 29M accept 0 0:07 0.00% python3.
79282 root 1 20 0 20M 7644K kqread 1 0:01 0.00% lighttpd
16838 root 1 52 0 44M 24M accept 0 0:01 0.00% php-cgi
18188 root 1 20 0 51M 26M piperd 0 0:01 0.00% php-cgi
17915 root 1 20 0 44M 25M accept 0 0:01 0.00% php-cgi
406 root 1 52 0 32M 20M wait 1 0:01 0.00% python3.
I don't know anything about bsd so none of my linux tricks are handy here... any way to tell what opnsense is trying to do with ifconfig?
ifconfig is very widely used.
can you try "top -a" or just System: Diagnostics: Activity to grab full command line?
perhaps this will make it clearer which script is being executed
Looks like kernel{if_config_tqg_0} is the culprit?
11 root 155 ki31 0 32K CPU1 1 188:10 99.27% [idle{idle: cpu1}]
11 root 155 ki31 0 32K RUN 0 185:25 97.20% [idle{idle: cpu0}]
12 root -60 - 0 224K WAIT 0 3:11 1.96% [intr{swi4: clock (0)}]
50005 unbound 20 0 149M 38M kqread 1 0:13 1.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
0 root -76 - 0 480K - 1 0:18 0.14% [kernel{if_config_tqg_0}]
QuoteLooks like kernel{if_config_tqg_0} is the culprit?
imho you still need to find the exact 'ifconfig' command that triggers that if_config_tqg task.
and last 'top' result not showing any cpu load ;)
This is exactly why I bothered posting on the forum... To get help figuring out what is causing this spike. *How* do I determine what is running this command when it seems like all my tools are either missing (strace, lsof) or different in this bsd environment. The top command shows no load because, as.the title of the thread states, this spike is intermittent. It will ever so briefly peg a CPU core which would cause intermittent latency if I ever put this thing into service.
Just a couple of thoughts based on my sysadmin experience.
1)This will require some manual monitoring of CPU load, but could you see if the CPU spikes are occurring randomly or. possibly, almost exactly on the minute? i.e. a cron job running on the minute.
2) Is your WAN interface connected to anything? Did you give it a static address? Is it possible that the O/S is trying to configure your WAN interface with DHCP at regular intervals?
Good luck.
if the ifconfig command still manages to appear in 'top' results, though not for long, I would try something like
top -aHSTn -d 300 99999 > /var/log/top.log
I would have guessed something like Reporting: Traffic: Top talkers tab is open but even in that case the above command didn't show me a single line with ifconfig - that's how fast it executes imho.
I was able to capture this:
last pid: 14419; load averages: 0.83, 0.73, 0.52 up 1+15:00:13 09:56:33
131 threads: 4 running, 113 sleeping, 14 waiting
CPU: 3.3% user, 0.0% nice, 16.4% system, 1.5% interrupt, 78.8% idle
Mem: 71M Active, 90M Inact, 979M Wired, 700M Buf, 6679M Free
Swap: 8192M Total, 8192M Free
THR USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
100004 root 155 ki31 0B 32K RUN 1 38.6H 91.68% [idle{idle: cpu1}]
100003 root 155 ki31 0B 32K RUN 0 38.0H 66.52% [idle{idle: cpu0}]
100391 root 20 0 13M 2932K CPU0 0 0:00 25.40% /sbin/ifconfig -m -v
However the bsd ifconfig manpage doesn't seem to list those shorthand params? Possibly something to do with VLAN mode changes?
OK, I've re-created this condition. Executing `ifconfig -m -v` lists available interfaces, and seems to hitch on the ix interfaces (I'm assuming this is causing the CPU spike). I would guess an issue with the ix driver used in this version? Removing VLANs from the interfaces has the same problem, so it's not that.
may be related
https://forum.opnsense.org/index.php?topic=25440.15
https://github.com/opnsense/core/issues/5349
Ok, so it looks like it IS a known issue, and 22.1 isn't a fix at all. I'll downgrade to 21.7.5 and verify that fixes the issue (I also updated the firmware on my NIC and that didn't help at all either).
It appears if I don't have the dashboard page open the spikes don't happen either... I guess that works lol
@AdSchellevis pointed out the place in the source code where this call is used. legacy_interfaces_details function is used quite widely, so i don't think that not using a couple of widgets can completely get rid of the problem
We were hoping FreeBSD 13 would behave better but now we're putting our workaround back for 22.1. The problem is this is either a hardware or driver issue. Generally, we don't recommend the Intel ixgbe-driver based cards at all.
Cheers,
Franco
Quote from: Fright on January 25, 2022, 06:26:17 AM
@AdSchellevis pointed out the place in the source code where this call is used. legacy_interfaces_details function is used quite widely, so i don't think that not using a couple of widgets can completely get rid of the problem
I can confirm removing the 'interface status' widget eliminates the issue.
Quote from: franco on January 25, 2022, 07:21:14 AM
We were hoping FreeBSD 13 would behave better but now we're putting our workaround back for 22.1. The problem is this is either a hardware or driver issue. Generally, we don't recommend the Intel ixgbe-driver based cards at all.
Cheers,
Franco
Unfortunately there aren't a lot of options for multi-gigabit ethernet-based cards :/
> Unfortunately there aren't a lot of options for multi-gigabit ethernet-based cards :/
That's true. We will try to take the topic to FreeBSD, but if in practice nobody high profile is using these cards chances of fixes even from Intel itself are slimmer.
Cheers,
Franco