Weird CPU useage

Started by heyheyheyhey, November 04, 2021, 01:50:54 PM

Previous topic - Next topic
Thanks.

I brought it back to mainstream, saw the CPU weirdness, and ran the ifconfig directly on the terminal.

I have 18 interfaces total, took 6-7 seconds for them all to populate.

for now I would probably just use 21.7.4 with the reverted patches, maybe there's an update for the driver.

November 05, 2021, 12:17:10 AM #17 Last Edit: November 05, 2021, 12:19:14 AM by johndchch
Quote from: heyheyheyhey on November 04, 2021, 01:50:54 PM
I have got 2 machine, both updated 21.7.4, both are showing the same symptom.  There is an ifconfig command taking up 1-3 cpu cores.  The CPU Core shows pegged in all monitoring.  Rebooting makes no difference, and trying to kill the pid returns not found. I backed up and reinstalled the whole OS and the issue immediately returned.

I had that - I had two 'spare' NICs in the machine (un-used), disabling them in BIOS so only the assigned nics were visible to opnsese fixed things ( spare nics were a couple of intel i210 1gbe cards - main interfaces are the ports on a x540-t2 )

Just chiming in I also noticed the same problem.

I have a motherboard with 6x NICs using the ix driver.
No LAGGs defined (I *used* to have some defined, but they are removed).

Accessing certain screens triggers a multi second lag and 'ifconfig' has 100% cpu utilization.  These include:
* SSH login
* GUI login
* dashboard
* some interfaces screens

ifconfig -m -v takes about half a second to run per adapter, so around3+ seconds total.

applying the opnsense-patch 5acaca4 913afdb   makes the problem go away.

Just adding a reply to follow this issue. I had the same issue as well when upgrading from 21.7.3 --> 21.7.4. Very slow web interface and noticeably increase CPU usage. Reloading 21.7.1 seems to have resolved my issue. I didn't know about the revert command.

Run bare metal on a Supermicro X10SLH-N6; E3-1270v3; 16GB with an Intel X520-DA2 card added as well for the LAN side.

November 07, 2021, 11:01:55 PM #20 Last Edit: November 07, 2021, 11:08:14 PM by Mondmann
Hello guys,
we were from 21.7.4 back to 21.7.1 -> until then everything was still OK
we also have patch:
opnsense-patch 5acaca4 913afdb
helped CPU load OK again and the GUI runs smoothly again... Are now again on 21.7.4
(Intel X540 t2 installed)
Greetings and thanks from Germany
OPNsense 22.7.9*WG-kmod*OpenSSL*OpenVPN* AdGuardHome*i7-7700*32GB*256SSD*ix0-1, igb0-4, em0*OpenVPN+Wireguard WG0, WG1*NetGear ProSafe XS508*AP Netgear WAX610*alles echtes Blech* Sorry, my English is translated via app*

Quote from: heyheyheyhey on November 04, 2021, 01:50:54 PM
ifconfig command taking up 1-3 cpu cores

From the image you attached, how can you tell 1-3 cpu cores were being used?  Thanks

Installed an x520 card over here with an ixgbe driver, the issue is reproducible and definitely related to the driver.

On my end ifconfig -v operates normally as soon as I insert modules into the empty slots, which seems to point to some missing detection when trying to read the i2c bus (which is only possible when there is a module inserted).

Can you try to install sfp+ modules (or cables) and check if the issue is gone when all slots are occupied?

Thanks,

Best regards,

Ad

November 10, 2021, 10:43:09 AM #23 Last Edit: November 10, 2021, 10:58:36 AM by AdSchellevis
Driver update with workaround posted on GitHub https://github.com/opnsense/core/issues/5349 , ixgbe seems to have long standing issues with related topics (e.g. https://sourceforge.net/p/e1000/mailman/message/32199158/). 

For future setups if possible I would prefer an Intel x700 series card (ixl) as these have been proven to be stable in our experience.

Best regards,

Ad

November 10, 2021, 05:17:39 PM #24 Last Edit: November 10, 2021, 05:23:31 PM by johndchch
Quote from: AdSchellevis on November 10, 2021, 10:43:09 AM
For future setups if possible I would prefer an Intel x700 series card (ixl) as these have been proven to be stable in our experience.

given I can buy x520-da2 or x540-t2 for about us$70, wheras an x710-t2 is about us$600 I don't think this is a viable 'fix'

anyone running the 22.x beta able to confirm if the issue is present on freebsd13? ( update - just saw your comment on github that it is indeed better on 22/freebsd13 - sounds like that is a the proper 'fix' )

November 10, 2021, 05:19:41 PM #25 Last Edit: November 10, 2021, 05:31:27 PM by johndchch
Quote from: AdSchellevis on November 09, 2021, 09:05:20 AM
Installed an x520 card over here with an ixgbe driver, the issue is reproducible and definitely related to the driver.

On my end ifconfig -v operates normally as soon as I insert modules into the empty slots, which seems to point to some missing detection when trying to read the i2c bus (which is only possible when there is a module inserted).

Can you try to install sfp+ modules (or cables) and check if the issue is gone when all slots are occupied?

the issue with 'slow' output from ifconfig -m -v is still present with the x540-t2 ( with both ports connected and good link ) - so it's NOT a sfp+/i2c issue - it's a driver issue

Quotethe issue with 'slow' output from ifconfig -m -v is still present with the x540-t2 ( with both ports connected and good link ) - so it's NOT a sfp+/i2c issue - it's a driver issue

Same thing also happens when the eeprom isn't readable (incompatible cable for example) and the i2c routine will keep trying to read. Easy to detect by the way, ifconfig -v won't show any details about the connected DAC or module (last line usually is nd6...)

Unfortunately the Intel x500 series when used with an external phy (sfp+) modules are known to have issues, x700 series are more reliable in these circumstances.

If you think you have more experience about why this isn't an i2c issue, please feel free to fix the driver so we can all conclude that the timeout happens for no reason (which I obviously don't expect, with all the time I have spend on this in the last days debugging it).

November 10, 2021, 08:32:29 PM #27 Last Edit: November 10, 2021, 08:54:13 PM by johndchch
Quote from: AdSchellevis on November 10, 2021, 06:04:05 PM
If you think you have more experience about why this isn't an i2c issue, please feel free to fix the driver so we can all conclude that the timeout happens for no reason (which I obviously don't expect, with all the time I have spend on this in the last days debugging it).

you're saying it an i2c issue to talking to the sfp+ modules - the x540 has an integrated phy - so if the problem exhibits on both platforms you're not looking at an i2c/sfp+ bug

on rhel8 - if you do an ethtool --module-info on an x520 with a populated sfp+ the result is basically instantaneous. If you do it with no sfp+ installed you get a pause followed by an eeprom i/o error - and if you to the same to an x540 ( with it's integrated phy) you get the same pause and same error

on hardenedBSD you get a pause with with ifconfig -v to an x540-t2 - and no different output with/without the -v option - it's like the hBSD ix driver is always querying i2c (and hitting a timeout) even when it's not appropriate to that model card




Quotehow about you actually bother reading intel's tech docs - the x520 line has i2c to talk to the sfp+ modules - the x540 with it's integrated phy has NO i2c - so if the problem exhibits on both platforms you're not looking at an i2c bug

You probably don't mind if I'm personally not going to spend time on your comments, you can test our available kernel, wait for 22.1, I don't really mind, whatever suits you.

The cards I've seen so far have issues with i2c, that doesn't say there aren't other issues with the same family, as mentioned earlier I prefer X700 series for stability reasons, you're free to make your own choices and live with them.



Quote from: johndchch on November 10, 2021, 08:32:29 PM
on hardenedBSD you get a pause with with ifconfig -v to an x540-t2 - and no different output with/without the -v option - it's like the hBSD ix driver is always querying i2c (and hitting a timeout) even when it's not appropriate to that model card

We did check the drivers and the code is the same on Linux. There could be a bug somewhere, but the funny thing is Intel is maintaining the drivers in Linux and FreeBSD alike and the stock FreeBSD driver has the same issue.

It's best to take your theory to them directly. They are pretty decent and helpful guys judging from previous interactions.  :)


Cheers,
Franco