OPNsense Forum

Archive => 21.7 Legacy Series => Topic started by: heyheyheyhey on November 04, 2021, 01:50:54 pm

Title: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 01:50:54 pm
I have got 2 machine, both updated 21.7.4, both are showing the same symptom.  There is an ifconfig command taking up 1-3 cpu cores.  The CPU Core shows pegged in all monitoring.  Rebooting makes no difference, and trying to kill the pid returns not found. I backed up and reinstalled the whole OS and the issue immediately returned.

It doesn't persist forever, it dies at some point, but as soon as I open the UI again it occurs.

Anyone have any idea how to troubleshoot some more? Or have seen something similar?
Title: Re: Weird CPU useage
Post by: 3rik on November 04, 2021, 05:15:46 pm
Hi

After i did my upgrade to 21.7.4 the hole machine got superslow.
Everything took forever to do in GUI.

Found the solution in the german part of the forum.
just roll back the opnsense package to 21.7.3. running the following command.

# opnsense-revert -r 21.7.3 opnsense

After that my opnsense box is back to normal.



Title: Re: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 05:23:08 pm
Thank you, when i did the roll back only two packages changed:
py38-dnspython2: 2.1.0
and pkg-1.16.3 was reinstalled.

Interesting, the rollback immediately resolved the weirdness I was seeing, without a reboot.
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 04, 2021, 08:35:05 pm
Hi,

Comparing the diff between 21.7.3 and 21.7.4 this is the only thing that stands out https://github.com/opnsense/core/commit/913afdbd196a1ba68d0f5b9e88491b97133157b9

It could be an issue with the -v switch on ifconfig in some scenario's, can you update to 21.7.4 again and revert this feature using the following command:

Code: [Select]
opnsense-patch 5acaca4 913afdb

A bit of context about the setup could also be helpful, like the number of configured interfaces and the type.

Thanks in advance,

Best regards,

Ad
Title: Re: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 08:39:55 pm
Sure I will try that.

System is a Supermicro X10SLH-LN6TF with an additional X520-DA2 card.  I have 4 physical interfaces configured, and two routed IPsec tunnels configured.
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 04, 2021, 08:44:15 pm
thanks, any of the interfaces combined in a lag or some type of fiber optics used?
Title: Re: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 08:55:00 pm
No lagg's. The X520-DA2 is a dual sfp+ card, one of the ports is populated with a DAC as the lan interface.

As soon as I upgraded to 21.7.4 the issue appeared.  As soon as I applied the patch it resolved it.
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 04, 2021, 08:58:16 pm
The X520-DA2 is an "ix" driver, right? sounds like a driver issue when asking verbose stats, can you check if the technical names are ix0, ix1, ... ? (should also be visible as "Device" in the interface settings)
Title: Re: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 09:04:29 pm
You are correct using the ix driver.  This motherboard has 6 10GbE interfaces using the ix driver, and I have two more on the intel card.

I am using hardware for CSC, TSO, LRO, and VLAN filtering across the ix interfaces.
Title: Re: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 09:05:32 pm
yes it is ix0, ix1, ix2....
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 04, 2021, 09:06:34 pm
although I don't expect the offloading features are related here, but can you try turning them off?
Title: Re: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 09:17:19 pm
Your inference is correct, turning off the offloading features made no difference.

I think you have it narrowed to something fixed in the patch/commit that your shared is causing the issue.
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 04, 2021, 09:20:38 pm
not really actually, the commit exposes a driver issue (try to execute "ifconfig -m -v" and see what happens).

Skimming the commits in the ix code doesn't expose an immediate fix, but I think I have a similar card available somewhere to test.
Title: Re: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 09:23:24 pm
How do undo

opnsense-patch 5acaca4 913afdb

so that I can test?
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 04, 2021, 09:36:51 pm
I usually reinstall the core package so I know I'm in the upstream state, using :

Code: [Select]
pkg install -f opnsense

But for the ifconfig command that doesn't influence the results, ifconfig is part of the bsae system.
Title: Re: Weird CPU useage
Post by: heyheyheyhey on November 04, 2021, 09:40:59 pm
Thanks.

I brought it back to mainstream, saw the CPU weirdness, and ran the ifconfig directly on the terminal.

I have 18 interfaces total, took 6-7 seconds for them all to populate.
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 04, 2021, 09:44:03 pm
for now I would probably just use 21.7.4 with the reverted patches, maybe there's an update for the driver.
Title: Re: Weird CPU useage
Post by: johndchch on November 05, 2021, 12:17:10 am
I have got 2 machine, both updated 21.7.4, both are showing the same symptom.  There is an ifconfig command taking up 1-3 cpu cores.  The CPU Core shows pegged in all monitoring.  Rebooting makes no difference, and trying to kill the pid returns not found. I backed up and reinstalled the whole OS and the issue immediately returned.

I had that - I had two 'spare' NICs in the machine (un-used), disabling them in BIOS so only the assigned nics were visible to opnsese fixed things ( spare nics were a couple of intel i210 1gbe cards - main interfaces are the ports on a x540-t2 )
Title: Re: Weird CPU useage
Post by: easyrhino on November 05, 2021, 06:36:09 pm
Just chiming in I also noticed the same problem.

I have a motherboard with 6x NICs using the ix driver.
No LAGGs defined (I *used* to have some defined, but they are removed).

Accessing certain screens triggers a multi second lag and 'ifconfig' has 100% cpu utilization.  These include:
* SSH login
* GUI login
* dashboard
* some interfaces screens

ifconfig -m -v takes about half a second to run per adapter, so around3+ seconds total.

applying the opnsense-patch 5acaca4 913afdb   makes the problem go away.
Title: Re: Weird CPU useage
Post by: HotRodNerd on November 06, 2021, 07:23:13 pm
Just adding a reply to follow this issue. I had the same issue as well when upgrading from 21.7.3 --> 21.7.4. Very slow web interface and noticeably increase CPU usage. Reloading 21.7.1 seems to have resolved my issue. I didn't know about the revert command.

Run bare metal on a Supermicro X10SLH-N6; E3-1270v3; 16GB with an Intel X520-DA2 card added as well for the LAN side.
Title: Re: Weird CPU useage
Post by: Mondmann on November 07, 2021, 11:01:55 pm
Hello guys,
we were from 21.7.4 back to 21.7.1 -> until then everything was still OK
we also have patch:
opnsense-patch 5acaca4 913afdb
helped CPU load OK again and the GUI runs smoothly again... Are now again on 21.7.4
(Intel X540 t2 installed)
Greetings and thanks from Germany
Title: Re: Weird CPU useage
Post by: schuc on November 08, 2021, 02:33:12 am
ifconfig command taking up 1-3 cpu cores

From the image you attached, how can you tell 1-3 cpu cores were being used?  Thanks
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 09, 2021, 09:05:20 am
Installed an x520 card over here with an ixgbe driver, the issue is reproducible and definitely related to the driver.

On my end ifconfig -v operates normally as soon as I insert modules into the empty slots, which seems to point to some missing detection when trying to read the i2c bus (which is only possible when there is a module inserted).

Can you try to install sfp+ modules (or cables) and check if the issue is gone when all slots are occupied?

Thanks,

Best regards,

Ad
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 10, 2021, 10:43:09 am
Driver update with workaround posted on GitHub https://github.com/opnsense/core/issues/5349 , ixgbe seems to have long standing issues with related topics (e.g. https://sourceforge.net/p/e1000/mailman/message/32199158/). 

For future setups if possible I would prefer an Intel x700 series card (ixl) as these have been proven to be stable in our experience.

Best regards,

Ad
Title: Re: Weird CPU useage
Post by: johndchch on November 10, 2021, 05:17:39 pm
For future setups if possible I would prefer an Intel x700 series card (ixl) as these have been proven to be stable in our experience.

given I can buy x520-da2 or x540-t2 for about us$70, wheras an x710-t2 is about us$600 I don't think this is a viable 'fix'

anyone running the 22.x beta able to confirm if the issue is present on freebsd13? ( update - just saw your comment on github that it is indeed better on 22/freebsd13 - sounds like that is a the proper 'fix' )
Title: Re: Weird CPU useage
Post by: johndchch on November 10, 2021, 05:19:41 pm
Installed an x520 card over here with an ixgbe driver, the issue is reproducible and definitely related to the driver.

On my end ifconfig -v operates normally as soon as I insert modules into the empty slots, which seems to point to some missing detection when trying to read the i2c bus (which is only possible when there is a module inserted).

Can you try to install sfp+ modules (or cables) and check if the issue is gone when all slots are occupied?

the issue with 'slow' output from ifconfig -m -v is still present with the x540-t2 ( with both ports connected and good link ) - so it's NOT a sfp+/i2c issue - it's a driver issue
Title: Re: Weird CPU useage
Post by: AdSchellevis on November 10, 2021, 06:04:05 pm
Quote
the issue with 'slow' output from ifconfig -m -v is still present with the x540-t2 ( with both ports connected and good link ) - so it's NOT a sfp+/i2c issue - it's a driver issue

Same thing also happens when the eeprom isn't readable (incompatible cable for example) and the i2c routine will keep trying to read. Easy to detect by the way, ifconfig -v won't show any details about the connected DAC or module (last line usually is nd6...)

Unfortunately the Intel x500 series when used with an external phy (sfp+) modules are known to have issues, x700 series are more reliable in these circumstances.

If you think you have more experience about why this isn't an i2c issue, please feel free to fix the driver so we can all conclude that the timeout happens for no reason (which I obviously don't expect, with all the time I have spend on this in the last days debugging it).
Title: Re: Weird CPU useage
Post by: johndchch on November 10, 2021, 08:32:29 pm
If you think you have more experience about why this isn't an i2c issue, please feel free to fix the driver so we can all conclude that the timeout happens for no reason (which I obviously don't expect, with all the time I have spend on this in the last days debugging it).

you're saying it an i2c issue to talking to the sfp+ modules - the x540 has an integrated phy - so if the problem exhibits on both platforms you're not looking at an i2c/sfp+ bug

on rhel8 - if you do an ethtool --module-info on an x520 with a populated sfp+ the result is basically instantaneous. If you do it with no sfp+ installed you get a pause followed by an eeprom i/o error - and if you to the same to an x540 ( with it's integrated phy) you get the same pause and same error

on hardenedBSD you get a pause with with ifconfig -v to an x540-t2 - and no different output with/without the -v option - it's like the hBSD ix driver is always querying i2c (and hitting a timeout) even when it's not appropriate to that model card



Title: Re: Weird CPU useage
Post by: AdSchellevis on November 10, 2021, 08:44:19 pm
Quote
how about you actually bother reading intel's tech docs - the x520 line has i2c to talk to the sfp+ modules - the x540 with it's integrated phy has NO i2c - so if the problem exhibits on both platforms you're not looking at an i2c bug

You probably don't mind if I'm personally not going to spend time on your comments, you can test our available kernel, wait for 22.1, I don't really mind, whatever suits you.

The cards I've seen so far have issues with i2c, that doesn't say there aren't other issues with the same family, as mentioned earlier I prefer X700 series for stability reasons, you're free to make your own choices and live with them.


Title: Re: Weird CPU useage
Post by: franco on November 10, 2021, 09:17:30 pm
on hardenedBSD you get a pause with with ifconfig -v to an x540-t2 - and no different output with/without the -v option - it's like the hBSD ix driver is always querying i2c (and hitting a timeout) even when it's not appropriate to that model card

We did check the drivers and the code is the same on Linux. There could be a bug somewhere, but the funny thing is Intel is maintaining the drivers in Linux and FreeBSD alike and the stock FreeBSD driver has the same issue.

It's best to take your theory to them directly. They are pretty decent and helpful guys judging from previous interactions.  :)


Cheers,
Franco
Title: Re: Weird CPU useage
Post by: isamudaison on January 24, 2022, 08:34:31 pm
For future setups if possible I would prefer an Intel x700 series card (ixl) as these have been proven to be stable in our experience.

given I can buy x520-da2 or x540-t2 for about us$70, wheras an x710-t2 is about us$600 I don't think this is a viable 'fix'

anyone running the 22.x beta able to confirm if the issue is present on freebsd13? ( update - just saw your comment on github that it is indeed better on 22/freebsd13 - sounds like that is a the proper 'fix' )

I've ran into this on 22.1RC1 ( https://forum.opnsense.org/index.php?topic=26478.0 ) and it is indeed still an issue...
Title: Re: Weird CPU useage
Post by: AdSchellevis on January 24, 2022, 09:28:08 pm
As a temporary workaround, you can always remove the -v from https://github.com/opnsense/core/blob/161d24650b6020393b57238c0a0d4e40110dc6d3/src/etc/inc/interfaces.lib.inc#L213 (will break some lag features on our end, but if you don't use them it shouldn't matter that much)

The intel x500 series so far seems to be the only card with locking issues when extracting detailed interface characteristics.
Ideally someone would dig into the driver code and try to fix the issue, it's not very hard to replicate on a stock FreeBSD 12/13 system as far as I know.

Best regards,

Ad
Title: Re: Weird CPU useage
Post by: AdSchellevis on January 25, 2022, 09:33:01 am
I totally forgot that I already made a workaround which we hoped wouldn't be needed in FreeBSD 13 (https://github.com/opnsense/src/commit/1382f0f64310790e0df67f4cd42d1662104a7043)

Fact remains that the upstream driver is broken and a proper fix would be practical as future driver updates might break setups, and as we don't use these cards ourselves it's not always guaranteed we will spend a lot of time tracking issues like this.

Best regards,

Ad