Periodic NIC issues (?) with Protectli Vault, Intel i226-V

Started by fornax, July 01, 2026, 02:09:53 AM

Previous topic - Next topic
I'm working on troubleshooting an issue that's been popping up irregularly since deploying OPNSense on a Protectli VP3210 (both new to me). The device is set up to perform all DHCP, DNS, firewall, and routing duties for the home network behind it.

Approximately every 1-7 days the network starts acting up. The symptoms aren't always consistent, but so far have tended to fall into one of three categories:

1. Something that looks like a DNS issue. Attempts to resolve an address will usually time out first try, but then succeed immediately a few seconds later. If I connect to the upstream router and use the same resolver, everything is normal.

2. DHCP will stop working for some/all devices.

3. An online game I play regularly has trouble connecting to the game servers.

Regardless of the symptom, the workaround that resolves it (temporarily) is the same. Go to Interfaces -> Settings, uncheck "Disable hardware checksum offload", Apply, recheck the box, Apply again. Everything immediately starts working as it should. (This is why I assume this is a NIC issue.)

Doing some research, I see that it's not uncommon for people to have issues with the Intel i226-V NICs, something I missed when I chose the hardware. Based on what I read I've been playing with various tunables, rebooting as necessary:

dev.igc.0.fc=0
dev.igc.1.fc=0
dev.igc.0.eee_control=0
dev.igc.1.eee_control=0
net.isr.bindthreads=1
net.isr.maxthreads=-1
net.isr.dispatch=deferred
net.inet.ip.intr_queue_maxlen=3000
hw.pci.enable_aspm=0

So far nothing has made a difference. The other thing that seems to be done commonly with these NICs is to upgrade the NVM firmware, which I'll try if I have to but that's a bit intimidating. Anyone have any other ideas before I go that route?

ASPM is causing this for I226 devices and I am not aware that updating the NIC firmware fixes that.

If there is an updated BIOS for the Protectl, try that first. You can actually make that go away with ASPM off, but AFAIK, you can only disable this for the whole machine under OpnSense if the BIOS does not set it selectively for your NICs.

The global setting is by done setting the tuneable hw.pci.enable_aspm=0. You should probably also set dev.igc.X.eee_control=0 with X=0,1.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, Leox LXT-010H-D

1100 down / 450 up, Bufferbloat A+

Hi there,

While having a look at this issue, I noticed a potential bug in the iflib code making an automatic reset in case of a TX hang impossible, a custom kernel has been published which resolves this (though likely not the final patch version). Would you mind installing this kernel to see if this changes anything about the issue?

# opnsense-update -zk 26.1.10-iflib
The commit in question is https://github.com/opnsense/src/commit/8dd26e6351d72a53fab5d47a16d053d5f8648353.

If it's this issue, you should see "watchdog timeout" messages appearing in your dmesg/system log. After this, an automatic reset should recover connectivity. If this happens, can you share these logs?

Your description of the issue sounds similar to others, however, there are still a lot of gaps to fill. Most notably, do you always need manual intervention to fix the issue? or does it recover on its own? Is it always the same igc interface? What is the auto-negotiated link state at the time of failure (# ifconfig igcX)?  If there's no auto-negotiation, what link speed did you set it to?

Also, and perhaps most importantly, can you share a snapshot of

# sysctl dev.igc.X (where X is the affected interface) after the failure?

Lastly, please do these tests with all default tunables. As far as I know, dev.igc.0.eee_control=0 will *enable* EEE.

Cheers,
Stephan

And I forgot to ask, since you mention that toggling offloading fixes it,

does

# ifconfig igcX down && ifconfig igcX up
also fix it?

Cheers,
Stephan

On the presumption that ASPM is contributing to the issue:

I don't know that ASPM can be reliably disabled on Protectli units, at least those running coreboot.  The 'hw.pci.enable_aspm=0' sysctl has not worked for me.  I spoke to them several months ago in a support ticket and was told that this is a common request on Reddit and they are looking into adding ASPM controls in coreboot but was not given any timeframe.  In the same conversation I was told that EEE gets a bad rep and I should not worry about it. 🤷

For the time being we're dependent on them issuing targeted fixes as they did for the VP2440.

---

Having said that I have also seen the DNS timeout on rare occasion on my V1410 but never thought twice about it (assumed it was a normal network glitch).  I don't see any watchdog related messages in dmesg as of now.
N5105 | 8/250GB | 4xi226-V | Community

All I know is that ASPM can be disabled globally or on a per-device basis. The latter normally needs BIOS or driver support, the former, AFAIK, does not.

I had freezing issues on my I226-V NICs on a Minisforum MS-01, the bug was fixed via a BIOS update. Also, there was this discussion of possible remedies: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279245
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, Leox LXT-010H-D

1100 down / 450 up, Bufferbloat A+

Quote from: tuto2 on July 01, 2026, 02:06:43 PMIf it's this issue, you should see "watchdog timeout" messages appearing in your dmesg/system log. After this, an automatic reset should recover connectivity. If this happens, can you share these logs?

Yeah, a lot of the stuff I've read indicates I should see interfaces flapping in the logs, but I haven't actually seen any evidence of that. There's no incidence of "watchdog" or "timeout" anywhere in dmesg or the month or so of system logs that have built up, and the only link state changes I see appear to correspond to when I uncheck/check the box and apply. So it's possible this is something else entirely.

QuoteYour description of the issue sounds similar to others, however, there are still a lot of gaps to fill. Most notably, do you always need manual intervention to fix the issue? or does it recover on its own? Is it always the same igc interface? What is the auto-negotiated link state at the time of failure (# ifconfig igcX)?  If there's no auto-negotiation, what link speed did you set it to?

So far the issue hasn't ever sorted itself out without my intervention, but since I can't tell when it's happening other than by the symptoms, I can't say for sure that it doesn't happen more often and occasionally fix itself. For most of the life of this issue speeds were on auto-negotiate (typically WAN 2500 Full, LAN 1000 Full). A few days ago I switched everything to static 1000 Full; the issue has popped up again once or twice since then.

I'll remove the tunables since they don't appear to be doing anything for me anyway, and I'll do a bit more digging next time this happens and see if I can come up with something more concrete to provide. In particular, I'm also curious if just ifconfig down/up will resolve it.

Quote from: OPNenthu on July 01, 2026, 07:38:38 PMI don't know that ASPM can be reliably disabled on Protectli units, at least those running coreboot.  The 'hw.pci.enable_aspm=0' sysctl has not worked for me.  I spoke to them several months ago in a support ticket and was told that this is a common request on Reddit and they are looking into adding ASPM controls in coreboot but was not given any timeframe.  In the same conversation I was told that EEE gets a bad rep and I should not worry about it. 🤷

I'm on coreboot, so that's interesting, thanks. And yeah, I noticed that disabling ASPM via the tunable didn't appear to have any affect on temperatures or anything, so I guess that tracks.

QuoteHaving said that I have also seen the DNS timeout on rare occasion on my V1410 but never thought twice about it (assumed it was a normal network glitch).  I don't see any watchdog related messages in dmesg as of now.

When it happens for me, it's basically every DNS resolution that hasn't been made recently. It's hard to miss.

Quote from: meyergru on July 01, 2026, 09:18:08 PMAll I know is that ASPM can be disabled globally or on a per-device basis. The latter normally needs BIOS or driver support, the former, AFAIK, does not.

Yeah, unfortunately it doesn't look like the coreboot BIOS offers any configurability here, and I'm on the latest (only?) version for my hardware. May try switching to AMI BIOS at some point, but I'm not there yet.