Broadcom P225P/BCM57414 SR-IOV/VF inconsistent behavior/no longer loading

Started by Railgun, March 09, 2022, 07:16:36 PM

Previous topic - Next topic
Hi all,

I've been using opnsense in a home lab environment for some time now and use the other version in a professional capacity, but the setup I'm doing now is new to me and doing my head in. 

I'll try to be brief here. 

In short, I'm building a new server.  EPYC 7282 with a Supermicro H12SSL-i board running ESXi 7.

I have a Broadcom P225P/BCM57414 NIC with SR-IOV enabled. 

The initial deployment of a new OPNSense VM was met with some issues regarding the drivers for the cards.  It showed:

none0@pci0:11:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x14e4 device=0x16dc subvendor=0x14e4 subdevice=0x16d7
    vendor     = 'Broadcom Inc. and subsidiaries'
    device     = 'NetXtreme-E Ethernet Virtual Function'
    class      = network
    subclass   = ethernet

none1@pci0:19:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x14e4 device=0x16dc subvendor=0x14e4 subdevice=0x16d7
    vendor     = 'Broadcom Inc. and subsidiaries'
    device     = 'NetXtreme-E Ethernet Virtual Function'
    class      = network
    subclass   = ethernet

none2@pci0:27:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x14e4 device=0x16dc subvendor=0x14e4 subdevice=0x16d7
    vendor     = 'Broadcom Inc. and subsidiaries'
    device     = 'NetXtreme-E Ethernet Virtual Function'
    class      = network
    subclass   = ethernet



...similar to another thread I'd seen.  I was able to manually load the drivers via "kldload if_bnxt" with success, and added it to loader.conf.local, also with success upon a reboot. 

However, upon rebooting the host, it all went to pot. 

After previously starting to configure the new interfaces within the UI, and rebooting, all the newly created interfaces disappeared.  I was met with the same "none0@" output above.  However, upon checking to see if something didn't run as expected:

kldload if_bnxt
kldload: can't load if_bnxt: module already loaded or in kernel


I rebooted the host again, only to be met with the VM not starting up at all as the NIC in question seemed to drop out of having SR-IOV enabled.  I rebooted once more, and one port was enabled, one disabled.  Both times ESXi was indicating that it WAS enabled, but required a reboot. 

This was done, but still experienced these interfaces dropping off.  I deleted the loader.conf.local file to prevent the drivers from being loaded, rebooted, manually tried to load the drivers again but indicated they were already loaded. 

The only thing I could see from dmesg was

bnxt0: <Broadcom NetXtreme-E Ethernet Virtual Function> mem 0xffa04000-0xffa07fff,0xff900000-0xff9fffff,0xffa00000-0xffa03fff at device 0.0 on pci5
bnxt0: Timeout sending HWRM_VER_GET: (timeout: 1000) seq: 0
bnxt0: attach: hwrm ver get failed
bnxt0: IFDI_ATTACH_PRE failed 60
device_attach: bnxt0 attach returned 60



In other attempts to boot, I saw:

bnxt0: Timeout sending HWRM_RING_ALLOC: (timeout: 2000) seq: 225
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 226
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 227
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 228
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 229
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 230
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 231
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 232
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 233
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 234
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 235
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 236
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 237
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 238
bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 239
bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 240
bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 241
bnxt1: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 15
bnxt1: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 16
bnxt1: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 17
bnxt2: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 15
bnxt2: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 16
bnxt2: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 17
bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 242
...



Which seemed to repeat endlessly. 

I now have no idea why or how that occurred and am at a loss here.  I'm guessing this is a guest issue.  I'm spinning up various other VMs at the moment but this has been the first.  I'll see whether there is other odd behavior with other VMs. 

This particular thread can be closed. 

There was some odd sequencing somehow that got these setup into an odd state.