Kernel panic loading mlx4en module for ConnectX-3 with PCIe passthrough Proxmox

Started by Ritzy1506, Today at 12:03:41 AM

Previous topic - Next topic
Hi,

I couldn't find if running OPNsense in Proxmox was a supported configuration or not, so sorry if this isn't allowed.

Software versions:
proxmox-ve: 9.1.0 (running kernel: 6.17.2-1-pve)
OPNsense 26.1.8_5

When I load mlx4_core and mlx4_en in Proxmox, the interface comes up correctly. I've passed it through to the OPNsense VM:
root@proxmox-1:~# lspci -nnk | grep Mellanox -A3
01:00.0 Ethernet controller [0200]: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] [15b3:1007]
        Subsystem: Mellanox Technologies ConnectX-3 Pro 10 GbE Dual Port SFP+ Adapter [15b3:0080]
        Kernel driver in use: vfio-pci
        Kernel modules: mlx4_core
root@proxmox-1:~# lsmod | grep -e mlx -e vfio
vfio_pci               20480  1
vfio_pci_core          86016  1 vfio_pci
irqbypass              16384  2 vfio_pci_core,kvm
vfio_iommu_type1       49152  1
vfio                   65536  8 vfio_pci_core,vfio_iommu_type1,vfio_pci
iommufd               126976  1 vfio
root@proxmox-1:~# qm showcmd 101 | tr ' -' '\n-' | grep 01:00.0 -B1
-device
'vfio-pci,host=0000:01:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0'

In OPNsense, mlx4.ko is already loaded. When I run kldload mlx4en, I get the following panic:
[15] mlx4_core0: <mlx4_core> mem 0x82000000-0x820fffff,0xc000000000-0xc0007fffff irq 16 at device 0.0 on pci1
[15] <6>mlx4_core: Mellanox ConnectX core driver v3.7.1 (November 2021)
[15] mlx4_core: Initializing 0000:01:00.0
[21] mlx4_core0: Unable to determine PCI device chain minimum BW
[21] vtcon0: <VirtIO Console Adapter> on virtio_pci1
[21] ichsmb0: <Intel 82801I (ICH9) SMBus controller> port 0x8000-0x803f irq 16 at device 31.3 on pci0
[21] smbus0: <System Management Bus> on ichsmb0
[22] uhid0 on uhub1
[22] uhid0: <QEMU QEMU USB Tablet, class 0/0, rev 2.00/0.00, addr 2> on usbus7
[23] lo0: link state changed to UP
[25] vtnet0: link state changed to UP
[26] arp: 10.17.0.42 moved from 98:b7:85:20:58:c7 to ee:cf:26:d6:53:34 on vtnet0
[103] mlx4_en mlx4_core0: Activating port:1
[103] mlxen0: link state changed to DOWN
[103] mlxen0: Ethernet address: 50:6b:4b:5d:aa:a0
[103] <4>mlx4_en: mlx4_core0: Port 1: Using 2 TX rings
[103] <4>mlx4_en: mlx4_core0: Port 1: Using 4 RX rings
[103] <4>mlx4_en: mlxen0: Using 2 TX rings
[103] <4>mlx4_en: mlxen0: Using 4 RX rings
[103] <4>mlx4_en: mlxen0: Initializing port
[103] mlx4_en mlx4_core0: Activating port:2
[103]
[103]
[103] Fatal trap 12: page fault while in kernel mode
[103] cpuid = 0; apic id = 00
[103] fault virtual address = 0x0
[103] fault code = supervisor read instruction, page not present
[103] instruction pointer = 0x20:0x0
[103] stack pointer         = 0x28:0xfffffe0010784c18
[103] frame pointer         = 0x28:0xfffffe0010784c40
[103] code segment = base 0x0, limit 0xfffff, type 0x1b
[103] = DPL 0, pres 1, long 1, def32 0, gran 1
[103] processor eflags = interrupt enabled, resume, IOPL = 0
[103] current process = 12 (swi6: task queue)
[103] rdi: fffff80070fc3000 rsi: fffffe0010784c90 rdx: fffffe00a5fd3ac8
[103] rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000000
[103] rax: 0000000000000000 rbx: fffffe0010784c90 rbp: fffffe0010784c40
[103] r10: fffff80070a15000 r11: fffff800015aa000 r12: 0000000000008802
[103] r13: 0000000000000010 r14: fffffe00a5fd3ac8 r15: fffff80070a15000
[103] trap number = 12
[103] panic: page fault
[103] cpuid = 0
[103] time = 1778881147
[103] KDB: stack backtrace:
[103] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0010784960
[103] vpanic() at vpanic+0x161/frame 0xfffffe0010784a90
[103] panic() at panic+0x43/frame 0xfffffe0010784af0
[103] trap_pfault() at trap_pfault+0x3da/frame 0xfffffe0010784b40
[103] calltrap() at calltrap+0x8/frame 0xfffffe0010784b40
[103] --- trap 0xc, rip = 0, rsp = 0xfffffe0010784c18, rbp = 0xfffffe0010784c40 ---
[103] ??() at 0/frame 0xfffffe0010784c40
[103] dump_iface() at dump_iface+0x145/frame 0xfffffe0010784cf0
[103] rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe0010784d70
[103] do_link_state_change() at do_link_state_change+0x44/frame 0xfffffe0010784dc0
[103] taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe0010784e40
[103] taskqueue_run() at taskqueue_run+0x68/frame 0xfffffe0010784e60
[103] ithread_loop() at ithread_loop+0x239/frame 0xfffffe0010784ef0
[103] fork_exit() at fork_exit+0x81/frame 0xfffffe0010784f30
[103] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0010784f30
[103] --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
[103] KDB: enter: panic

Does anyone have any ideas? Thanks

 

Is there any specific reason why you want to pass the adapters thru? That lays the burden of driving the adapters to FreeBSD, which traditionally is not particularly good at handling "exotic" hardware, all along on top of a virtualisation layer.

More often than not, people use Proxmox just because they want Linux to handle the hardware, because OpnSense is known to have problems with it, but in order to do that, you would use virtio, not passthru, see https://forum.opnsense.org/index.php?topic=44159.0
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 450 up, Bufferbloat A+