Random crash/freeze CPU 100% [kernel{if_io_tqg_N}]

Started by angled_whacking924, April 22, 2025, 11:26:32 PM

Previous topic - Next topic
No, no , there should be an option on the BIOS to disable but if there isn't one, then no problem. A module can be blocklisted but no need.
The question was about being sure. If you are sure you don't have a cable plugged in thinking that is on another, all is good - for this exercise-.

It does smell a lot like some hardware problem. Which hardware is the not obvious part.

I've managed to capture a load of packets before and after a crash, I'm not seeing anything obvious, no multicast storms or any kind of overloading, just very much normal traffic, even after a crash.

When it does crash devices already connected seem to be unaffected or at least the only issue is some instability. New devices cannot be connected.

I think I've narrowed it down to my WiFi ap, a Unifi AC AP PRO.

The more data flowing through it the sooner/more frequently it seems to crash.

However I can't see what is triggering the crash unless the Unifi's firmware is corrupted or something?

Well, I've given up with Opnsense. It's not the WiFi ap, I've swapped that out and still it crashes randomly. I've gone through every single device connected to it, isolating each device and trying to see what triggers the crash, totally inconclusive.

I've done packet capture and analysis at the moment of crash yet nothing stands out, no packet floods or anything to suggest the interface is being overwhelmed.

The crashes are entirely random, it can be up for 4 days before crashing or 2 hours, there's no rhyme or reason to it. There have been many false endings where it's been stable for days only to then crash. There are only a couple of mentions of this issue online with pfSense and opnsense however no one seems to have resolved it, or at least their methods have had no effect on my situation.

It could be hardware of the router itself but I replaced the NIC, upgrading to a Intel 350, but that also had no effect. Otherwise no other hardware has been changed. The only change I can identify between this issue existing and not are opnsense updates, however I'm not rolling back through every version to test that theory. This thing had been working beautifully for 2-3 years until 6-9 months ago and now it's totally unreliable. I'm just glad I only use this at home and not in a commercial environment!!

I cannot be bothered to invest any more time or effort into this issue so am moving on to something else!

I would guess that the device itself is the problem. Alas, you mostly do not find any BIOS updates to fix it on devices as old as a J1900-based box. I have one box with an J4125, which sometimes crashes, as well. The newer specimens with N100/N150 work more reliably and are cheap, too.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

my thoughts too. I would expect the OP to put OPN on another hardware and these problems won't be repeated.

Well, 3 days into running an upgraded N5105 system, it has "randomly" crashed again. Give me strength.

Mind if I ask if you still use the same power supply or any other components from the old system, like RAM oder SSD?

Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+