Single core usage spike cause intermittently by ifconfig

Started by isamudaison, January 23, 2022, 12:21:25 AM

Previous topic - Next topic
I have a clean install of 22.1.r1-amd64 and while setting it up I noticed a weird single core CPU spike every minute or so. Inspecting `top`, I noticed that `ifconfig` on occasion is taking a lot of CPU (sometimes there are two instances of the process that spike). This setup hasn't been connected to a WAN yet (I've setup the PPPoE config for it, though), and I've only got two VLANs, no IDS, and simple firewall rules. Hardware is an Intel x64 CPU, NIC #1 - igb (onboard), NIC2 - ix (expansion, Intel X550). I've disabled monitoring (netflow) and there was no change.

Is this something simple/known or am I going to have to go deeper?

Some more data:


Versions OPNsense 22.1.r1-amd64
FreeBSD 13.0-STABLE
OpenSSL 1.1.1m 14 Dec 2021
Updates Click to check for updates.
CPU type Intel(R) Pentium(R) CPU G4400 @ 3.30GHz (2 cores)
CPU usage
Load average 1.06, 0.80, 0.64
Uptime 00:50:43
Current date/time Sat Jan 22 15:49:51 PST 2022
Last config change Sat Jan 22 15:44:52 PST 2022
CPU usage
0 %
State table size
0 % ( 2852/808000 )
MBUF usage
2 % ( 10160/501272 )
Memory usage
4 % ( 400/8087 MB )
SWAP usage
0 % ( 0/8191 MB )
Disk usage
2% / [ufs] (1.4G/64G)



Results of top when ifconfig spikes:
CPU:  0.0% user,  0.0% nice, 50.0% system,  0.0% interrupt, 50.0% idle
Mem: 114M Active, 22M Inact, 261M Wired, 93M Buf, 7421M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME     CPU COMMAND
49839 root          1  73    0    13M  2868K CPU1     1   0:01 101.01% ifconfig
48045 root          1  20    0    14M  3540K CPU0     0   0:00   0.07% top
90043 root          3  20    0    34M    10M kqread   1   0:01   0.04% syslog-n
75368 root          1  20    0    13M  2500K bpf      0   0:00   0.03% filterlo
75672 root          1  20    0    18M  7152K select   1   0:00   0.02% sshd
32810 unbound       2  20    0   149M    38M kqread   0   0:03   0.02% unbound
90760 root          1  20    0    23M    11M select   1   0:01   0.01% python3.
76457 root          2  20    0    21M  6396K select   0   0:00   0.01% ntpd
91230 root          1  20    0    23M    11M select   0   0:01   0.01% python3.
53155 root          1  20    0    12M  2096K select   0   0:01   0.00% powerd
  408 root          1  52    0    56M    29M accept   0   0:07   0.00% python3.
79282 root          1  20    0    20M  7644K kqread   1   0:01   0.00% lighttpd
16838 root          1  52    0    44M    24M accept   0   0:01   0.00% php-cgi
18188 root          1  20    0    51M    26M piperd   0   0:01   0.00% php-cgi
17915 root          1  20    0    44M    25M accept   0   0:01   0.00% php-cgi
  406 root          1  52    0    32M    20M wait     1   0:01   0.00% python3.


I don't know anything about bsd so none of my linux tricks are handy here... any way to tell what opnsense is trying to do with ifconfig?

ifconfig is very widely used.
can you try "top -a" or just System: Diagnostics: Activity to grab full command line?
perhaps this will make it clearer which script is being executed

January 23, 2022, 07:11:28 AM #3 Last Edit: January 23, 2022, 07:16:51 AM by isamudaison
Looks like kernel{if_config_tqg_0} is the culprit?



11 root 155 ki31 0 32K CPU1 1 188:10 99.27% [idle{idle: cpu1}]
11 root 155 ki31 0 32K RUN 0 185:25 97.20% [idle{idle: cpu0}]
12 root -60 - 0 224K WAIT 0 3:11 1.96% [intr{swi4: clock (0)}]
50005 unbound 20 0 149M 38M kqread 1 0:13 1.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
0 root -76 - 0 480K - 1 0:18 0.14% [kernel{if_config_tqg_0}]

QuoteLooks like kernel{if_config_tqg_0} is the culprit?
imho you still need to find the exact 'ifconfig' command that triggers that if_config_tqg task.
and last 'top' result not showing any cpu load  ;)

This is exactly why I bothered posting on the forum... To get help figuring out what is causing this spike. *How* do I determine what is running this command when it seems like all my tools are either missing (strace, lsof) or different in this bsd environment. The top command shows no load because, as.the title of the thread states, this spike is intermittent. It will ever so briefly peg a CPU core which would cause intermittent latency if I ever put this thing into service.


Just a couple of thoughts based on my sysadmin experience.
1)This will require some manual monitoring of CPU load, but could you see if the CPU spikes are occurring randomly or. possibly, almost exactly on the minute? i.e. a cron job running on the minute.
2) Is your WAN interface connected to anything? Did you give it a static address? Is it possible that the O/S is trying to configure your WAN interface with DHCP at regular intervals?

Good luck.

if the ifconfig command still manages to appear in 'top' results, though not for long, I would try something like
top -aHSTn -d 300 99999 > /var/log/top.log


I would have guessed something like Reporting: Traffic: Top talkers tab is open but even in that case the above command didn't show me a single line with ifconfig - that's how fast it executes imho.

I was able to capture this:

last pid: 14419;  load averages:  0.83,  0.73,  0.52                                                                                                                                       up 1+15:00:13  09:56:33
131 threads:   4 running, 113 sleeping, 14 waiting
CPU:  3.3% user,  0.0% nice, 16.4% system,  1.5% interrupt, 78.8% idle
Mem: 71M Active, 90M Inact, 979M Wired, 700M Buf, 6679M Free
Swap: 8192M Total, 8192M Free

   THR USERNAME    PRI NICE   SIZE    RES STATE    C   TIME     CPU COMMAND
100004 root        155 ki31     0B    32K RUN      1  38.6H  91.68% [idle{idle: cpu1}]
100003 root        155 ki31     0B    32K RUN      0  38.0H  66.52% [idle{idle: cpu0}]
100391 root         20    0    13M  2932K CPU0     0   0:00  25.40% /sbin/ifconfig -m -v


However the bsd ifconfig manpage doesn't seem to list those shorthand params? Possibly something to do with VLAN mode changes?

OK, I've re-created this condition. Executing `ifconfig -m -v` lists available interfaces, and seems to hitch on the ix interfaces (I'm assuming this is causing the CPU spike). I would guess an issue with the ix driver used in this version? Removing VLANs from the interfaces has the same problem, so it's not that.


Ok, so it looks like it IS a known issue, and 22.1 isn't a fix at all. I'll downgrade to 21.7.5 and verify that fixes the issue (I also updated the firmware on my NIC and that didn't help at all either).

It appears if I don't have the dashboard page open the spikes don't happen either... I guess that works lol

@AdSchellevis pointed out the place in the source code where this call is used. legacy_interfaces_details function is used quite widely, so i don't think that not using a couple of widgets can completely get rid of the problem

We were hoping FreeBSD 13 would behave better but now we're putting our workaround back for 22.1. The problem is this is either a hardware or driver issue. Generally, we don't recommend the Intel ixgbe-driver based cards at all.


Cheers,
Franco