Kernel Parameters on Tunables not working.

Started by rafaelbs, October 24, 2025, 05:38:26 AM

Previous topic - Next topic
Quote from: meyergru on October 25, 2025, 08:59:38 PMYou have 64 threads? Oh, yes. Judging from this, you even have 128. IDK if QAT changes anything in that.

Maybe you should try with a lower number, IDK if FreeBSD has problems with such high numbers.

Yep, we have 64 cores and 128 threads.

And we already tried with a lower number, without success.

Quote from: pfry on October 25, 2025, 09:15:25 PM
Quote from: rafaelbs on October 25, 2025, 08:15:53 PM[...]
My values are pretty much the same as yours, except rss.bits I'm using 6.

How many cores do you have? 6 bits suggests 64 (ah - just read the latest...). But the value doesn't have to be based on cores - it's apparently more of an entropy setting.

How about "netstat -Q"? I don't know that you'd need to post it; I truncated it above.

Here is my netstat -Q

# netstat -Q
Configuration:
Setting                        Current        Limit
Thread count                         1            1
Default queue limit                256        10240
Dispatch policy               deferred          n/a
Threads bound to CPUs         disabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   1000   flow  default   ---
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256 source   direct   ---
ip6        6   1000   flow  default   ---

Workstreams:
WSID CPU   Name     Len WMark   Disp'd  HDisp'd   QDrops   Queued  Handled
   0   0   ip         0    24        0        0        0   518021   518021
   0   0   igmp       0     0        0        0        0        0        0
   0   0   rtsock     0     1        0        0        0       17       17
   0   0   arp        0    56        0        0        0   489663   489663
   0   0   ether      0     0   522772        0        0        0   522772
   0   0   ip6        0     3        0        0        0     1534     1534

After a couple tests, I figured that Tunables that I set on GUI (System / Settings / Tunables) are correcting appearing on /boot/loader.conf but they don't take any effect.

Here are the messages from dmesg related to my interface:

[2] mlx5_core0: <mlx5_core> mem 0x4007e000000-0x4007fffffff at device 0.0 on pci1
[2] mlx5: Mellanox Core driver 3.7.1 (November 2021)ahciem0: Unsupported enclosure interface
...

[18] mlx5_core3: <mlx5_core> mem 0x3007c000000-0x3007dffffff at device 0.1 on pci6
[18] mlx5_core3: INFO: mlx5_port_module_event:709:(pid 12): Module 1, status: plugged and enabled
[18] mlx5_core3: INFO: health_watchdog:577:(pid 0): PCIe slot advertised sufficient power (75W).
[23] mlx5_core3: INFO: init_one:1713:(pid 0): cannot find SR-IOV PCIe cap
[23] mlx5_core: INFO: (mlx5_core3): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(128) max mc(2048)
[23] mlx5_core3: Failed to initialize SR-IOV support, error 2

...

[213] mce3: ERR: mlx5e_ioctl:3608:(pid 19542): tso4 disabled due to -txcsum.
[213] mce3: ERR: mlx5e_ioctl:3621:(pid 19776): tso6 disabled due to -txcsum6.

....

[214] mce3: INFO: mlx5e_open_locked:3297:(pid 0): NOTE: There are more RSS buckets(64) than channels(61) available


Any idea if those messages are related to get net.inet.rss.enabled on?

Thanks.
Rafael

What all network interfaces do you have installed? ("8 Nvidias" could be a number of things; I didn't see anything other than the Mellanox mentioned specifically.) I don't suppose you have a different 100Gb card flying around...? Or anything else you can plug up for testing (40Gb QSFP, 10/25Gb SFP, etc., if you have the interconnect).

The messages don't appear to be significant, unless the device/driver requires SR-IOV or TSO/CRC. You could toggle 'em (BIOS and "Interfaces: Settings", respectively). (I wouldn't expect either to have any effect.)

Quick edit: Oh, and reduce your net.inet.rss.buckets?

We have 4 cards MCX516A-CCAT, and each has 2 100Gbps ports, and also 2 on-board gigabit ethernet.

From the 8 100Gbps ports, we are using 2, one for LAN e other to WAN

From the 2 1Gbps ports, we are using 1 for backup WAN link.

Since the beginning, all NICs were working by default, firewall is working fine. The challenge here is performance increasing to work with Suricata on IPS mode.

Quote from: rafaelbs on October 26, 2025, 06:03:58 PMWe have 4 cards MCX516A-CCAT, and each has 2 100Gbps ports, and also 2 on-board gigabit ethernet.
[...]

Are the two 1GbEs 1000BASE-T, perhaps i210s, with one attached to a BMC? Probably not an interface/driver issue, then. I'd give the buckets a shot - it seems the Mellanox has an RSS mapping limit (not surprising). I don't know if this would disable RSS, but it can't hurt to poke it. You might check for complaints from the 1GbEs, too, and pick a safe start value.

October 26, 2025, 06:28:56 PM #21 Last Edit: October 26, 2025, 06:37:24 PM by meyergru
SR-IOV should not be any problem, but I think this message leads to something:

Quote from: rafaelbs on October 26, 2025, 05:50:13 PM[214] mce3: INFO: mlx5e_open_locked:3297:(pid 0): NOTE: There are more RSS buckets(64) than channels(61) available

The netstat output actually seems to indicate that only one queue is being used. Maybe the card or the FreeBSD driver does not support RSS.

Older Mellanox cards had that problem (under Linux):

https://forums.developer.nvidia.com/t/rss-not-working-on-mellanox-connectx-3-nic/206980

And even the Connect-X5 is reported to have problems in that field:

https://forums.developer.nvidia.com/t/connectx-5-rx-rss-has-no-impact-even-allocating-more-rx-queues/206157

Those are very specialized cards with drivers that seem mainly supported for Linux, so maybe the FreeBSD implementation is not the best. Even under Linux, for high-performance applications, the vendor drivers are recommended:

https://forums.developer.nvidia.com/t/connectx-5-support-for-inner-outer-rss/253706

Your whole setup is leaning on the high-performance side - if you have a business subscription, you could ask Deciso for help.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Quote from: pfry on October 26, 2025, 06:16:10 PM
Quote from: rafaelbs on October 26, 2025, 06:03:58 PMWe have 4 cards MCX516A-CCAT, and each has 2 100Gbps ports, and also 2 on-board gigabit ethernet.
[...]

Are the two 1GbEs 1000BASE-T, perhaps i210s, with one attached to a BMC? Probably not an interface/driver issue, then. I'd give the buckets a shot - it seems the Mellanox has an RSS mapping limit (not surprising). I don't know if this would disable RSS, but it can't hurt to poke it. You might check for complaints from the 1GbEs, too, and pick a safe start value.

Only one 1GbE is connected, but not enabled. There is a dedicated IPMI port.

Here are logs from the 1GbEs:

[1] igb0: <Intel(R) I350 (Copper)> port 0xd000-0xd01f mem 0x9a620000-0x9a63ffff,0x9a644000-0x9a647fff at device 0.0 on pci21
[1] igb0: EEPROM V1.63-0 eTrack 0x800009fa
[1] igb0: Using 1024 TX descriptors and 1024 RX descriptors
[1] igb0: Using 8 RX queues 8 TX queues
[1] igb0: Using MSI-X interrupts with 9 vectors
[1] igb0: netmap queues/slots: TX 8/1024, RX 8/1024
[1] igb1: <Intel(R) I350 (Copper)> port 0xd020-0xd03f mem 0x9a600000-0x9a61ffff,0x9a640000-0x9a643fff at device 0.1 on pci21
[1] igb1: EEPROM V1.63-0 eTrack 0x800009fa
[1] igb1: Using 1024 TX descriptors and 1024 RX descriptors
[1] igb1: Using 8 RX queues 8 TX queues
[1] igb1: Using MSI-X interrupts with 9 vectors
[1] igb1: netmap queues/slots: TX 8/1024, RX 8/1024
[199] igb0: link state changed to UP

Quote from: pfry on October 26, 2025, 05:56:29 PMQuick edit: Oh, and reduce your net.inet.rss.buckets?
It might be that the driver just can't live with this not matching, IDK what this number depends on, so maybe setting it directly doesn't have much effect.

I had success bringing a measly onboard RTL8168B up to speed while following this guide, which basically confirms jonny5s statement: ( https://binaryimpulse.com/2022/11/opnsense-performance-tuning-for-multi-gigabit-internet/ ).
That device originally had a stability issue when MSI/MSI-X was enabled (a "BIOS" (EFI) update fixed that) but still was slow and couldn't handle --bidir, so I switched to the proprietary drivers and then started tinkering with that guide. The sections following the "dispatch" setting occasionally gave incremental improvements, but the big boost came from the stuff above and including "dispatch" setting.
I had an iperf3 -P4 --bidir -t3600 running to a server on the OPNsense box itself during the tweaking, and indeed only after setting the "net.isr.dispatch" to "deferred" it had a notable effect. I had set and applied each setting sequentially in order to see if anything changed in the iperf3, maybe that matters? I'm too new to BSD tuning to be able to judge this reliably, so all I can do is share my observations on a really low-end device.

Lets start with

What do those settings do?
What are the expectations of setting them (what's the issue being fixed)?

https://github.com/opnsense/core/issues/5415
Mini-pc N150 i226v x520, FREEDOM

October 27, 2025, 09:54:46 AM #25 Last Edit: October 27, 2025, 09:58:00 AM by meyergru
Adding to what I wrote about specialised drivers above, see this document.

Also, try "sysctl -a | fgrep mlx". There seem to be some undocumented parameters there. I do not have that card, so maybe there are more parameters when the card is actually installed. From the Mellanox docs, it seems that RSS parameters are set in the driver rather than in the OS.

Probably, you could even use the vendor driver instead of the OS-provided one, which may offer more parameters and/or features.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+