Severe NIC card instability after upgrading

Started by apocalypticgoat, March 05, 2025, 03:20:30 AM

Previous topic - Next topic
March 05, 2025, 03:20:30 AM Last Edit: March 05, 2025, 03:33:28 AM by apocalypticgoat
Hello,

I upgraded my installation from 24.7.12_4 to 25.1.2 and upon doing so, my NIC card became severely unstable. The LAN port would intermittently flap and when it wasn't flapping, IPv4 connectivity would work fine but IPv6 connectivity would not. On the IPv6 side, it would either not obtain an IPv6 address at all -or- if it did, it would not serve IPv6 addresses on the Lan side. If I ssh'd into into it I could IPv6 would resolve and respond normally.

I ended up reverting back to 24.7.12_4 and the system has stabilized. I checked the change logs for all versions between where I started and 25.1.2 and I didn't see a driver change that should have affected my NIC. Any idea's what may have gone wrong?

Specs:

Processor: Core i5-7400 CPU @ 3.00GHz
RAM:      12GB
Storage:  120GB kingston-sa400s37120g SSD
NIC:      intel x550-t2 Dual Port (Edit: I'm running this card with the link speed set for 2.5GB.)

No problems with my x550-T1, currently my Internet link at 1Gb. Do you have a reasonably late NVM/firmware on your T2?

The firmware version being reported by the card is 2.11.3. The latest version is 3.70 which puts me 7 versions behind. Is there any known issues between opnsense and the 500x firmware?

Quote from: apocalypticgoat on March 06, 2025, 04:54:42 AMThe firmware version being reported by the card is 2.11.3. The latest version is 3.70 which puts me 7 versions behind. Is there any known issues between opnsense and the 500x firmware?

I can't point to any offhand, but I have experienced NVM-to-driver mismatch issues with Linux, particularly with DPDK (which uses offloads). You could scan for potential issues in the release notes (they're cumulative). I've never bricked an Intel NIC with an upgrade, but naturally YMMV.

Quote from: pfry on March 06, 2025, 06:03:37 AMI can't point to any offhand, but I have experienced NVM-to-driver mismatch issues with Linux, particularly with DPDK (which uses offloads). You could scan for potential issues in the release notes (they're cumulative). I've never bricked an Intel NIC with an upgrade, but naturally YMMV.

I spent some time looking at the change logs for both opnsense and Freebsd and I cannot find any mentions of the driver. The next thing I tried was pulling the driver version and then comparing it to the one used in 25.1. I found old posts (18.x days) where people would pull this information by using:

# sysctl -a | grep -E 'dev.(igb|ix|em).*.%desc:'
When I tried that, while it did correctly pull up my NIC, it did not show a driver number. I haven't tried doing that on the 25.1 version as of yet.

Lastly, I posted on the freebsd subreddit and they suggested doing a pkg-search for the driver but that came up empty as well.

Any suggestions?

Quote from: apocalypticgoat on March 06, 2025, 04:54:42 AMThe firmware version being reported by the card is 2.11.3. The latest version is 3.70 which puts me 7 versions behind. Is there any known issues between opnsense and the 500x firmware?

There's a well known issue indeed: OPNsense doesn't develop Intel drivers for FreeBSD. 25.1 runs on FreeBSD 14.2, 24.7 was FreeBSD 14.1

Feel free to continue asking questions regarding driver development with FreeBSD/Intel people. I'm sure you'll have a great explanation on why you refuse to upgrade a card that needed 7 more firmware revisions after the one that you have installed.



Quote from: newsense on March 10, 2025, 12:30:59 AMThere's a well known issue indeed: OPNsense doesn't develop Intel drivers for FreeBSD. 25.1 runs on FreeBSD 14.2, 24.7 was FreeBSD 14.1

Feel free to continue asking questions regarding driver development with FreeBSD/Intel people. I'm sure you'll have a great explanation on why you refuse to upgrade a card that needed 7 more firmware revisions after the one that you have installed.


First, Your post is incredible unhelpful and insulting. I'm well aware that Opnsense (and freebsd for that matter) don't develop the intel driver for these cards. I also know what Freebsd versions are running.

I posted here because I had no clue if this was a card issue, opnsense issue, freebsd issue or fw issue and there are plenty of people who use these cards and know more about it than I. Could it be a firmware/driver mismatch? Sure. It could just as easily be the card failing and something in the update triggers it...or perhaps a driver issue. Just because Opnsense doesn't make the driver doesn't mean I can't post here. First step was posting here to get information from others to determine if it was just me or others.

I did reach out to the freebsd forum and you know what? They wanted me to pull the driver version to determine what fw version to flash. I can't because in opnsense's implementation of FreeBSD, the driver version no longer shows when using sysctl (it apparently did in the 18.x days). Further, pkg doesn't even show that the intel driver is installed. Worse, the intel driver isn't even available. FreeBSD folks mentioned adding the FreeBSD_ports repo to get it but said that's likely not ideal (and I have no clue either).

As for it being "a well known issue", I don't doubt that but I can't find any information on it either. I did find some information on firmware/driver mismatch in general with these cards but that brings us back to I need to know what driver version is running so I can't get the appropriate firmware.

Tell me how to find the driver version number, I'll get the firmware updated to the compatible version and we can likely put this to bed.

I don't think such driver versioning exists other than going by FreeBSD release numbers. Everything else is commits to the tree that end up in the next kernel afaik.

In terms of things to look at, dmesg, pciconf -lv, sysctl -a |grep igc - but you've probably been there already.

What I would try is getting the driver from /boot/kernel/if_igc.ko using IGC as an example - dunno how yours is presented/named, maybe ixgbe(?) - from 24.7.12, copy it to 25.1.2 and see if the card is stable.


One last thing to look at, on both 24.7 and 25.1 - replace it with your driver name.

sha256sum  /boot/kernel/if_igc.ko

Quote# sysctl -a | grep -E 'dev.(igb|ix|em).*.%desc:'

When I tried that, while it did correctly pull up my NIC, it did not show a driver number. I haven't tried doing that on the 25.1 version as of yet.

@apocalypticgoat if you can get the device, can you see anything with dmidecode i.e. say your nic is dev.ix0 then:
$sudo dmidecode /dev/ix0Should bring some more info.

Sorry I misled you there.
$sudo sysctl dev.ix0 should be helpful.
In my case for instance, using igc for the NICs:
snip
dev.igc.0.fc: 3
dev.igc.0.debug: -1
dev.igc.0.fw_version: EEPROM V2.17-0 eTrack 0x80000303
dev.igc.0.enable_aim: 1
dev.igc.0.nvm: -1
dev.igc.0.iflib.rxq1.rxq_fl0.buf_size: 2048
dev.igc.0.iflib.rxq1.rxq_fl0.credits: 1023
dev.igc.0.iflib.rxq1.rxq_fl0.cidx: 623
dev.igc.0.iflib.rxq1.rxq_fl0.pidx: 622
dev.igc.0.iflib.rxq1.cpu: 1
snip

Quote from: cookiemonster on March 10, 2025, 03:13:30 PM@apocalypticgoat if you can get the device, can you see anything with dmidecode i.e. say your nic is dev.ix0 then:
$sudo dmidecode /dev/ix0Should bring some more info.

This unfortunately did not expose the driver version.

Quote from: cookiemonster on March 10, 2025, 03:24:17 PMSorry I misled you there.
$sudo sysctl dev.ix0 should be helpful.

In my case it was dev.ix.0. This did yield a driver version of 4.0.1-k which from what I found is a driver provided by the kernel using the iflib framework.

dev.ix.0.fw_version: fw 2.11.3 nvm 1.00.0 Option ROM V1-b1458-p0 eTrack 0x8000048f
dev.ix.0.iflib.driver_version: 4.0.1-k
dev.ix.0.%driver: ix

4.0.1-k is at least 5 years old. I know this because I found a forum post from 2020 where someone mentioned that exact version number.

I checked this on 25.1 and it shows the same version number which means the driver remains unchanged..so maybe it wasn't a firmware/driver mismatch then?

After updating again it seems to be stable this time...so far. I'm wondering if something was buggered in the software somewhere.

When the previous update failed, I didn't have snapshots available so I reinstalled opnsense 24.1 from scratch and re-imported the backup I made from 24.1.12. This time during setup I made snapshots available.

I'm going to continue to monitor for any stability issues but I would like to update my cards firmware. I can't find much information on the 4.0.1-k driver and thus am unsure if the latest firmware is compatible with it or not. Does anyone have information on this?

QuoteI checked this on 25.1 and it shows the same version number which means the driver remains unchanged..so maybe it wasn't a firmware/driver mismatch then?
Maybe not. Maybe it is time to pause this path of NIC firmware updates and go back to basic diagnostics.

Quote from: cookiemonster on March 10, 2025, 11:16:02 PMMaybe not. Maybe it is time to pause this path of NIC firmware updates and go back to basic diagnostics.

I would agree with you except since doing a clean install and then re-upgrading, the issue hasn't returned. Can't really run diagnostics on an issue that isn't occurring. Given that a clean install seems to have resolved the issue I think its safe to assume something went wrong with the original upgrade.

> 4.0.1-k is at least 5 years old. I know this because I found a forum post from 2020 where someone mentioned that exact version number.

I checked out the current FreeBSD main branch and...

% git show -s
commit a8d2bccb87d0738c91f7e6a080375ae276e4c7d5 (HEAD -> main, upstream/main)
Author: Cheng Cui <cc@FreeBSD.org>
Date:   Wed Mar 5 11:35:20 2025 -0500

    tcp cc: use tcp_compute_pipe() for pipe in xx_post_recovery() directly
   
    This follows up with commit 67787d200488, and obsoletes the non-default
    pipe calculation from commit 46f584823798 nearly 25 years ago.
   
    Reviewed by: rscheff
    Differential Revision: https://reviews.freebsd.org/D49247
% git grep 4.0.1-k             
sys/dev/ixgbe/if_ix.c:static const char ixgbe_driver_version[] = "4.0.1-k";

I'm not sure what to make of your concerns here.


Cheers,
Franco

Quote from: franco on March 17, 2025, 03:04:20 PMI'm not sure what to make of your concerns here.

The primary concern of instability was resolved and was the result of a failed upgrade attempt due to unknown reasons.

The secondary concern is that the firmware is pretty old on my NIC and I was looking into updating it BUT rather than just update to the latest firmware and hope for the best, I was hoping either supported firmware versions were listed in the driver documentation some where -or- someone running the same card could confirm known good firmware/driver combinations.

Not having much luck getting that info, so for now the production build will remain as it is and I'll spin up a development one. I bought another of the same NIC card so I can update the firmware and test on the dev build I plan to put together.

I'll continue to monitor this thread though as it will be a bit before I get to making the dev build...perhaps someone will have the info I need before then and save me some work.