OPNsense crash with bnxt driver

Started by morphxyz, August 08, 2024, 09:30:07 AM

Previous topic - Next topic
Dear Community

Two days ago we've had our OPNsense suddenly stop working (pic1).
It's been working great for months and we have rebooted it many times before without issues.
But this time when we booted.. (pic2). bnxt couldn't load.
Yes we do have a custom tunable in place. only this one (pic3)

After another reboot the OPNsense is working smooth again.
Has anybody ever had a similar event happening with the bnxt driver?
Is there a way to have a deeper look in what actually happened except the Logs in the Web GUI?
We are since a bit afraid to touch it at all.

Would you suggest switching to a natively supported Network card?
It's an idea we've had for a while.
Is the error shown in pic1 the cause for the crash even?

Thank you for your thoughts and ideas :-)

Try 24.7 first. There have been bnxt fixes most likely.


Cheers,
Franco

Thank you for the swift response franco.

Will do tonight!

We updated to 24.7.1
No issues during updates except CrowdSec. But we solved that and everything is up to date.

We had another crash of said firewall (or at least the bnxt0 and bnxt1) yesterday though. It only came back after several reboots

We can't pin the issue down. The first error messages we can find are those:

2024-08-12T18:45:33   Notice   kernel   bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 44051   
2024-08-12T18:45:33   Notice   kernel   bnxt0: Timeout sending HWRM_PORT_QSTATS: (timeout: 2000) seq: 44050   
2024-08-12T18:45:33   Notice   kernel   bnxt1: Timeout sending HWRM_PORT_QSTATS: (timeout: 2000) seq: 24278   
2024-08-12T18:36:00   Error   configctl   error in configd communication Traceback (most recent call last): File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out   
2024-08-12T18:26:00   Error   configctl   error in configd communication Traceback (most recent call last): File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out   
2024-08-12T18:16:00   Error   configctl   error in configd communication Traceback (most recent call last): File "/usr/local/sbin/configctl", line 65, in exec_config_cmd line = sock.recv(65536).decode() ^^^^^^^^^^^^^^^^ TimeoutError: timed out

followed by a lot more kernel notices about bnxt0 and bnxt1

Seems to be similar to this post but not identical: https://forum.opnsense.org/index.php?topic=38434.0

We wonder if it's a hardware issue.. or maybe suricata/sensei?

Any ideas anyone?


Great! At least we know what to do now.
Let's hope reassigning the interfaces will do after changing the NIC.
Else it's going to be a long night.

Might also just end up using proxmox and opnsense within. we've had no issues with those nics there (virtualized, NOT pass through).

Thank you for the assistance and information franco!

I'm willing to take upstream fixes into our kernel. There seems to be a lot of movement in bnxt, but sadly most of it was missed for FreeBSD 14.1.

% git diff --stat upstream/stable/14 sys/dev/bnxt
sys/dev/bnxt/bnxt.h                         |    803 +
sys/dev/bnxt/bnxt_en/bnxt.h                 |   1314 -
sys/dev/bnxt/bnxt_en/bnxt_auxbus_compat.c   |    194 -
sys/dev/bnxt/bnxt_en/bnxt_auxbus_compat.h   |     75 -
sys/dev/bnxt/bnxt_en/bnxt_dcb.c             |    861 -
sys/dev/bnxt/bnxt_en/bnxt_dcb.h             |    127 -
sys/dev/bnxt/bnxt_en/bnxt_ulp.c             |    526 -
sys/dev/bnxt/bnxt_en/bnxt_ulp.h             |    161 -
sys/dev/bnxt/{bnxt_en => }/bnxt_hwrm.c      |   1306 +-
sys/dev/bnxt/{bnxt_en => }/bnxt_hwrm.h      |     24 +-
sys/dev/bnxt/{bnxt_en => }/bnxt_ioctl.h     |      0
sys/dev/bnxt/{bnxt_en => }/bnxt_mgmt.c      |     69 +-
sys/dev/bnxt/{bnxt_en => }/bnxt_mgmt.h      |     31 +-
sys/dev/bnxt/bnxt_re/bnxt_re-abi.h          |    177 -
sys/dev/bnxt/bnxt_re/bnxt_re.h              |   1077 -
sys/dev/bnxt/bnxt_re/ib_verbs.c             |   5498 --
sys/dev/bnxt/bnxt_re/ib_verbs.h             |    632 -
sys/dev/bnxt/bnxt_re/main.c                 |   4467 -
sys/dev/bnxt/bnxt_re/qplib_fp.c             |   3544 -
sys/dev/bnxt/bnxt_re/qplib_fp.h             |    638 -
sys/dev/bnxt/bnxt_re/qplib_rcfw.c           |   1338 -
sys/dev/bnxt/bnxt_re/qplib_rcfw.h           |    354 -
sys/dev/bnxt/bnxt_re/qplib_res.c            |   1226 -
sys/dev/bnxt/bnxt_re/qplib_res.h            |    840 -
sys/dev/bnxt/bnxt_re/qplib_sp.c             |   1234 -
sys/dev/bnxt/bnxt_re/qplib_sp.h             |    432 -
sys/dev/bnxt/bnxt_re/qplib_tlv.h            |    187 -
sys/dev/bnxt/bnxt_re/stats.c                |    773 -
sys/dev/bnxt/bnxt_re/stats.h                |    353 -
sys/dev/bnxt/{bnxt_en => }/bnxt_sysctl.c    |   1097 +-
sys/dev/bnxt/{bnxt_en => }/bnxt_sysctl.h    |      2 -
sys/dev/bnxt/{bnxt_en => }/bnxt_txrx.c      |      0
sys/dev/bnxt/{bnxt_en => }/convert_hsi.pl   |      0
sys/dev/bnxt/{bnxt_en => }/hsi_struct_def.h | 116136 ++++++++++---------------
sys/dev/bnxt/{bnxt_en => }/if_bnxt.c        |   1789 +-
35 files changed, 45919 insertions(+), 101366 deletions(-)

If any of this fixes the current issue you're seeing I don't know though.


Cheers,
Franco

That unfriendly remark on the forum aside bnxt and Broadcom in general is a tough territory for the FreeBSD project and it looks like Broacom don't really care to support anything but Windows and Linux. IMHO FreeBSD is not really to blame in this particular case.

Follow e.g. this discussion:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269133

Although a Broadcom employee, Chandrakanth Patil, tried to help and promised a fix, it's still not solved.

I would just stay away from their gear.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Well, Chandrakanth Patil did author a lot of these changes currently queued up for 14.2.


Cheers,
Franco

*fingers crossed*
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)