mlx4en failing to load after upgrade to 24.7

Started by CJ, August 19, 2024, 05:11:54 PM

Previous topic - Next topic
I've been running ConnectX cards for a while now and they've worked pretty well once you add the load command to the boot config.

https://www.routerperformance.net/opnsense/mellanox-connecx-management-in-opnsense/

After updating to 24.7 this doesn't seem to work anymore.  I have to manually log into the box and issue the mlx4en load command and then force an interface reload before it starts working.  This works until I reboot at which point I have to repeat the process.

I've checked and I still have the load command set to yes but it's not properly starting the card.  Any suggestions for what to check?


All I can say is it works fine for me on 24.7.1.

I use R86S with Mellanox

Works fine here (24.7.1; Mellanox ConnectX-3; Supermicro M11SDV-8CT-LN4F).
HW: Supermicro X11SCL-IF, i3-9100F, 32 GB ECC RAM, 250 GB SSD, Mellanox ConnectX-3, 10 GBit Internet

August 21, 2024, 01:24:22 PM #3 Last Edit: August 21, 2024, 01:56:40 PM by mic
Hello,

I have a AOC-MCX312C-XCCT (2 x 10 Gb SPP+) installed on a Supermicro AS-5019D-FTN4 and after update to 24.7 the ports do not work anymore. I tried the following actions:


  • Remove from /boot/loader.conf.local the only row with mlx4en_load="YES"
  • Reboot
  • Load mlx4en with kldload mlx4en
  • Reload all interfaces with configctl interface reconfigure <interface_name>

My interfaces are lagg0 and 3 VLANs so to reload interfaces I run the following commands:

  • configctl interface reconfigure lagg0
  • configctl interface reconfigure lagg0_vlan20
  • configctl interface reconfigure lagg0_vlan3
  • configctl interface reconfigure lagg0_vlan9

After all these steps the interfaces do not work....

This is the message at boot time (before I load the module):

This is the messages at boot time:
Quotemlx4_core0: <mlx4_core> mem 0xef800000-0xef8fffff,0x1fff8000000-0x1ffffffffff irq 54 at device 0.0 on pci4
mlx4_core: Mellanox ConnectX core driver v3.7.1 (November 2021)
mlx4_core: Initializing 0000:04:00.0
mlx4_core0: Unable to determine PCI device chain minimum BW
intsmb0: <AMD FCH SMBus Controller> at device 20.0 on pci0
smbus0: <System Management Bus> on intsmb0
ig4iic0: <Designware I2C Controller> iomem 0xfedc2000-0xfedc2fff irq 10 on acpi0
iicbus0: <Philips I2C bus (ACPI-hinted)> on ig4iic0
ig4iic1: <Designware I2C Controller> iomem 0xfedc3000-0xfedc3fff irq 11 on acpi0
iicbus1: <Philips I2C bus (ACPI-hinted)> on ig4iic1
ig4iic2: <Designware I2C Controller> iomem 0xfedc4000-0xfedc4fff irq 12 on acpi0
iicbus2: <Philips I2C bus (ACPI-hinted)> on ig4iic2
ig4iic3: <Designware I2C Controller> iomem 0xfedc5000-0xfedc5fff irq 13 on acpi0
iicbus3: <Philips I2C bus (ACPI-hinted)> on ig4iic3
ig4iic4: <Designware I2C Controller> iomem 0xfedc6000-0xfedc6fff irq 14 on acpi0
iicbus4: <Philips I2C bus (ACPI-hinted)> on ig4iic4
ig4iic5: <Designware I2C Controller> iomem 0xfedcb000-0xfedcbfff irq 15 on acpi0
iicbus5: <Philips I2C bus (ACPI-hinted)> on ig4iic5
driver bug: Unable to set devclass (class: ppc devname: (unknown))

Could you help me, please?

Thank you

Hello,

after some attempts I found a workaround. The problem is that OPNsense does not load the mlx4en module at startup even though the command mlx4en_load="YES" is present in the file /boot/loader.conf.local. A workaround is to create in
/usr/local/etc/rc.syshook.d/early/

the file
16-mlx4en-load

with the following content:

#!/bin/sh
kldload mlx4en


Now you have to set execute permissions to the file:
chmod +x 16-mlx4en-load

The last step is to reboot the system.

I hope this workaround can help someone


Would not it be a whole lot easier to use the documented way ( System->Settings->Tunables)? Works just fine here, loading an AMD watchdog driver this way. /boot/loader.conf.local is not used for anything I would say.


# grep amd /boot/loader.conf
amdsbwd_load="YES"

# kldstat -n amdsbwd
Id Refs Address                Size Name
3    1 0xffffffff82734000     4b40 amdsbwd.ko

Hi,

the solution proposed by doktornotor works!

Thank you

Nice. Busy with other things so did not check, but appears that .local dumpspace for anything unsupported which is often suggested on the other project, as said, is (no longer?) being used here. Which is IMO a good thing, the modifications done in the supported way will be saved to config.xml and applied again if you need to reinstall and restore. Locally modified files - not that much.


September 02, 2024, 03:30:23 PM #9 Last Edit: September 02, 2024, 03:43:57 PM by CJ
Quote from: franco on August 24, 2024, 10:11:27 PM
Well, this is sort of self-documenting in /boot/loader.conf:

https://github.com/opnsense/core/blob/0adece8d3e165acc0ba3bb2e1d8f0e6593dd8c41/src/etc/rc.loader.d/00-banner#L1-L6

Cheers,
Franco

I do have the appropriate line in /boot/loader.conf.local and up until 24.7 it's always worked.  When I look at /boot/loader.conf I don't see the mlx load line.

I went into system tuneables and added a new tuneable with a tuneable of mlx4en_load and a value of YES.  Upon saving that and applying changes the mlx4en_load line showed up in the tuneables section of /boot/loader.conf  Upon reboot, the module was loaded and my interfaces came up correctly.

It appears that the mechanism to process /boot/loader.conf.local was broken in the changes from 24.1 to 24.7.  I'm guessing that the step importing it into the main file got accidentally removed or commented out, but until I have a chance to dig through the code, I can't know for sure.

Name: modulename_load
Value: YES
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on September 02, 2024, 03:40:34 PM
Name: modulename_load
Value: YES

You replied while I was busy updating my post with the results of my testing that. :D