OPNsense Forum

Archive => 24.1, 24.4 Legacy Series => Topic started by: CJ on March 03, 2024, 09:08:06 PM

Title: ConnectX-3 stopped connecting after upgrade to 24.1
Post by: CJ on March 03, 2024, 09:08:06 PM
I just updated to 24.1 from 23.7 and everything appears to have gone perfectly fine except for my ConnectX-3 card.  After the upgrade completed the port kept flapping.  I tried rebooting, doing a complete power down, and unplugging and replugging the attached DAC with no success.

The only thing that appears to have fixed the issue was doing ifconfig down followed by ifconfig up.

Any suggestions for things to check?  I didn't have any problems on 23.7 with the card nor the ConnectX-2 that it replaced.  Once I get a chance I'm going to try rebooting to see if it comes up correctly or starts flapping again.
Title: Re: ConnectX-3 not connecting
Post by: CJ on March 04, 2024, 12:55:05 PM
I rebooted the system and the NIC won't connect at all.  The verbose ifconfig shows that it knows there is a DAC connected, but it stays no carrier.

Doing ifconfig down/up doesn't work until I unplug the DAC and reconnect it.  Then it will start flapping and I can down/up to get a solid connection.
Title: Re: ConnectX-3 stopped connecting after upgrade to 24.1
Post by: netnut on March 05, 2024, 12:25:09 AM
Quote from: CJ on March 03, 2024, 09:08:06 PM
Any suggestions for things to check?  I didn't have any problems on 23.7 with the card nor the ConnectX-2 that it replaced.  Once I get a chance I'm going to try rebooting to see if it comes up correctly or starts flapping again.

While it's a rather old(er) NIC, did you upgrade to latest firmware ?

https://downloaders.azurewebsites.net/downloaders/connectx3en_downloader/downloader2.html

I can remember there are some firmware / mode switches with Mellanox cards which might bug you, but it's too long ago to give you a direct pointer. You might find something here or do the firmware upgrade anyway (even if you're at latest version already) to reset the card to default.

https://network.nvidia.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf



Title: Re: ConnectX-3 stopped connecting after upgrade to 24.1
Post by: CJ on March 05, 2024, 02:28:53 PM
Quote from: netnut on March 05, 2024, 12:25:09 AM
While it's a rather old(er) NIC, did you upgrade to latest firmware ?

https://downloaders.azurewebsites.net/downloaders/connectx3en_downloader/downloader2.html

I can remember there are some firmware / mode switches with Mellanox cards which might bug you, but it's too long ago to give you a direct pointer. You might find something here or do the firmware upgrade anyway (even if you're at latest version already) to reset the card to default.

https://network.nvidia.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf

I updated it to the latest firmware right before I installed it.  It came up perfectly in 23.7 so I'm not sure what changed to cause it to have problems in 24.1.  Also, I had zero issues with the ConnectX-2 and whatever firmware it had, but that was also on 23.7.

Most of the issues I found of the card in FreeBSD were from the kernel not loading the proper modules but that was before the current situation.  Everything is loaded correctly from what I can tell.

I tried setting the media but all of the options I tried gave errors.  I'm using the command listed in the docs.
ifconfig -m mce<x> media <y> mediaopt full-duplex
https://docs.nvidia.com/networking/display/freebsdv371/driver+usage+and+configuration

Here is what it shows once it's working.

ifconfig -vvm mlxen0
mlxen0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: LAN (lan)
options=8c00a8<VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE>
capabilities=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
ether MAC
inet IP netmask 0xffffff00 broadcast MASK
groups: Internal
media: Ethernet autoselect (10Gbase-CX4 <full-duplex,rxpause,txpause>)
status: active
supported media:
media autoselect
media 40Gbase-CR4 mediaopt full-duplex
media 10Gbase-CX4 mediaopt full-duplex
media 10Gbase-SR mediaopt full-duplex
media 1000baseT mediaopt full-duplex
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
plugged: SFP/SFP+/SFP28 Unknown (Copper pigtail)
vendor: CISCO-MOLEX PN: 74752-9520 SN: SERIAL DATE: 2013-04-11


When OPNsense first boots, the Media line just shows Ethernet autoselect and status is no carrier.  However it still shows the plugged in DAC info.
Title: Re: ConnectX-3 stopped connecting after upgrade to 24.1
Post by: CJ on March 30, 2024, 03:06:21 PM
Not sure what changed, but this appears to be fixed in 24.1.4.  I'm no longer having any problems after updating.
Title: Re: ConnectX-3 stopped connecting after upgrade to 24.1
Post by: CJ on April 21, 2024, 08:04:40 PM
And now it's broken again in 24.1.6.  The card was flapping so bad that I couldn't even see what I was typing until I pulled the DAC.
Title: Re: ConnectX-3 stopped connecting after upgrade to 24.1
Post by: edsai on May 06, 2024, 09:26:33 PM
Quote from: CJ on April 21, 2024, 08:04:40 PM
And now it's broken again in 24.1.6.  The card was flapping so bad that I couldn't even see what I was typing until I pulled the DAC.

Any resolution to this? Was trying to figure out if I go connectx-3 or intel.
Title: Re: ConnectX-3 stopped connecting after upgrade to 24.1
Post by: CJ on June 07, 2024, 07:34:49 PM
Quote from: edsai on May 06, 2024, 09:26:33 PM
Quote from: CJ on April 21, 2024, 08:04:40 PM
And now it's broken again in 24.1.6.  The card was flapping so bad that I couldn't even see what I was typing until I pulled the DAC.

Any resolution to this? Was trying to figure out if I go connectx-3 or intel.

It seems to come and go.  Also, if I cold boot instead of restart it doesn't seem to happen.  Once it's up and connected though, it works fine.  It's really only an issue when I do updates.
Title: Re: ConnectX-3 stopped connecting after upgrade to 24.1
Post by: CJ on September 02, 2024, 03:53:43 PM
I'm still seeing this issue in 24.7 but it seems that I no longer have to physically unplug the DAC.  Fiddling with the ifconfig up and down commands is enough to eventually get the port to stop flapping.

Oddly, I've enabled the second port as a trunk to a managed switch and I haven't seen any flapping with it.  Looking at the ifconfig results, the only differences I see are in the vendor line.  The working port shows OEM while the problem port shows CISCO-MOLEX.  The OEM listed cable is a 10GTek DAC.

When I have some time I'll try swapping the Cisco DAC for a 10GTek.