21.7, disk errors, possibly related to iflib/netmap and igb driver

Started by dinguz, July 29, 2021, 11:40:09 AM

Previous topic - Next topic
I'm using a Qotom box (Q3xxG4-P, with the Intel 211 network card) for OPNsense which worked really well until the upgrade to 21.7. As soon as there is any load on the system, I start getting these disk errors (ahci CAM errors or timeouts).
So far I have tried changing the SSD for an A-brand one, and I have reinstalled the system from the ISO. Still no luck. I have tried to disable NCQ en trim, but this doesn't seem to be it.

Edit: now that I think of it, the errors always start happening after enabling Sensei. You see that debugging thingy about the network card being put in permanently promiscuous mode, and shortly after that, the system hangs. Could that have anything to do with this?

See also https://forum.opnsense.org/index.php?topic=24122.0
In theory there is no difference between theory and practice. In practice there is.


Yes, this looks like the case. We were able to narrow down the problem to igb(4) and ahci driver.

Will have more updates once we have more information.

The new test kernel by Franco seems to work, I have been running OPNsense + Sensei in native netmap mode for a few hours, and no more ahci errors/timeouts. Thanks to all involved!
In theory there is no difference between theory and practice. In practice there is.