Kernel panic loading mlx4en module for ConnectX-3 with PCIe passthrough Proxmox

Started by Ritzy1506, Today at 12:03:41 AM

Previous topic - Next topic
Hi,

I couldn't find if running OPNsense in Proxmox was a supported configuration or not, so sorry if this isn't allowed.

Software versions:
proxmox-ve: 9.1.0 (running kernel: 6.17.2-1-pve)
OPNsense 26.1.8_5

When I load mlx4_core and mlx4_en in Proxmox, the interface comes up correctly. I've passed it through to the OPNsense VM:
root@proxmox-1:~# lspci -nnk | grep Mellanox -A3
01:00.0 Ethernet controller [0200]: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] [15b3:1007]
        Subsystem: Mellanox Technologies ConnectX-3 Pro 10 GbE Dual Port SFP+ Adapter [15b3:0080]
        Kernel driver in use: vfio-pci
        Kernel modules: mlx4_core
root@proxmox-1:~# lsmod | grep -e mlx -e vfio
vfio_pci               20480  1
vfio_pci_core          86016  1 vfio_pci
irqbypass              16384  2 vfio_pci_core,kvm
vfio_iommu_type1       49152  1
vfio                   65536  8 vfio_pci_core,vfio_iommu_type1,vfio_pci
iommufd               126976  1 vfio
root@proxmox-1:~# qm showcmd 101 | tr ' -' '\n-' | grep 01:00.0 -B1
-device
'vfio-pci,host=0000:01:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0'

In OPNsense, mlx4.ko is already loaded. When I run kldload mlx4en, I get the following panic:
[15] mlx4_core0: <mlx4_core> mem 0x82000000-0x820fffff,0xc000000000-0xc0007fffff irq 16 at device 0.0 on pci1
[15] <6>mlx4_core: Mellanox ConnectX core driver v3.7.1 (November 2021)
[15] mlx4_core: Initializing 0000:01:00.0
[21] mlx4_core0: Unable to determine PCI device chain minimum BW
[21] vtcon0: <VirtIO Console Adapter> on virtio_pci1
[21] ichsmb0: <Intel 82801I (ICH9) SMBus controller> port 0x8000-0x803f irq 16 at device 31.3 on pci0
[21] smbus0: <System Management Bus> on ichsmb0
[22] uhid0 on uhub1
[22] uhid0: <QEMU QEMU USB Tablet, class 0/0, rev 2.00/0.00, addr 2> on usbus7
[23] lo0: link state changed to UP
[25] vtnet0: link state changed to UP
[26] arp: 10.17.0.42 moved from 98:b7:85:20:58:c7 to ee:cf:26:d6:53:34 on vtnet0
[103] mlx4_en mlx4_core0: Activating port:1
[103] mlxen0: link state changed to DOWN
[103] mlxen0: Ethernet address: 50:6b:4b:5d:aa:a0
[103] <4>mlx4_en: mlx4_core0: Port 1: Using 2 TX rings
[103] <4>mlx4_en: mlx4_core0: Port 1: Using 4 RX rings
[103] <4>mlx4_en: mlxen0: Using 2 TX rings
[103] <4>mlx4_en: mlxen0: Using 4 RX rings
[103] <4>mlx4_en: mlxen0: Initializing port
[103] mlx4_en mlx4_core0: Activating port:2
[103]
[103]
[103] Fatal trap 12: page fault while in kernel mode
[103] cpuid = 0; apic id = 00
[103] fault virtual address = 0x0
[103] fault code = supervisor read instruction, page not present
[103] instruction pointer = 0x20:0x0
[103] stack pointer         = 0x28:0xfffffe0010784c18
[103] frame pointer         = 0x28:0xfffffe0010784c40
[103] code segment = base 0x0, limit 0xfffff, type 0x1b
[103] = DPL 0, pres 1, long 1, def32 0, gran 1
[103] processor eflags = interrupt enabled, resume, IOPL = 0
[103] current process = 12 (swi6: task queue)
[103] rdi: fffff80070fc3000 rsi: fffffe0010784c90 rdx: fffffe00a5fd3ac8
[103] rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000000
[103] rax: 0000000000000000 rbx: fffffe0010784c90 rbp: fffffe0010784c40
[103] r10: fffff80070a15000 r11: fffff800015aa000 r12: 0000000000008802
[103] r13: 0000000000000010 r14: fffffe00a5fd3ac8 r15: fffff80070a15000
[103] trap number = 12
[103] panic: page fault
[103] cpuid = 0
[103] time = 1778881147
[103] KDB: stack backtrace:
[103] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010784960
[103] vpanic() at vpanic+0x161/frame 0xfffffe0010784a90
[103] panic() at panic+0x43/frame 0xfffffe0010784af0
[103] trap_pfault() at trap_pfault+0x3da/frame 0xfffffe0010784b40
[103] calltrap() at calltrap+0x8/frame 0xfffffe0010784b40
[103] --- trap 0xc, rip = 0, rsp = 0xfffffe0010784c18, rbp = 0xfffffe0010784c40 ---
[103] ??() at 0/frame 0xfffffe0010784c40
[103] dump_iface() at dump_iface+0x145/frame 0xfffffe0010784cf0
[103] rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe0010784d70
[103] do_link_state_change() at do_link_state_change+0x44/frame 0xfffffe0010784dc0
[103] taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe0010784e40
[103] taskqueue_run() at taskqueue_run+0x68/frame 0xfffffe0010784e60
[103] ithread_loop() at ithread_loop+0x239/frame 0xfffffe0010784ef0
[103] fork_exit() at fork_exit+0x81/frame 0xfffffe0010784f30
[103] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0010784f30
[103] --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
[103] KDB: enter: panic

Does anyone have any ideas? Thanks