x710-da2 - iperf3 - High packet Queue Drop Issue - Need Advice?

Started by Mark_the_Red, February 11, 2025, 05:09:59 PM

Previous topic - Next topic

Wondering if some of you could share some insight on this.

Running a n100 router with x710-da2 network card (latest firmware), ssd drive, 8gigs ddr5 ram, opnsense 25.1 (latest everything updated). 

Everything is running relatively smooth, but was trying to resolve a performance issue noticed with iperf3.   I basically get 10 gig transfer speeds for the first 6-7 tests and then it gets bogged down to around 3gigs and then below, through the last 3 tests. 

Upon investigating using this helpful thread:  https://docs.netgate.com/pfsense/en/latest/hardware/tune.html  I noticed I am getting lots of queue packet drops which correlates with the results I am seeing (i.e., packet traffic backlog at high transfer speeds)

sysctl net.inet.ip.intr_queue_drops
gives me this output:

net.inet.ip.intr_queue_drops: 73008
This appears to be the problem, as per the guide this should be zero.

Changing this value (1000, 2000, 3000000000, etc.) per the guide does not seem to fix anything.

sysctl net.inet.ip.intr_queue_maxlen=1000
My understanding is that the x710 should be able to handle this test easy peazy so I am wondering if its another hidden setting somewhere to resolve this queue drop issue on the card itself.   I've tried a lot of tuneable settings but still do not get anything resolved.

I'm kind of new to FreeBSD / opnsense so your insight is most appreciated.

Quote from: Mark_the_Red on February 11, 2025, 05:09:59 PM[...]
Running a n100 router [...]

What N100 hardware, and what slot for your x710-da2? The N100 does have 9 PCI-e version 3 lanes; you'd want at least 2 v3 lanes for every 10Gb port, but 4+ would be best. I haven't seen... well, any N100 systems, but slot size may be misleading. (Note that I still haven't tested my equipment at 10Gb. But it's all a wee bit bigger than an N100.)

Well I didn't want to turn this into a hardware shopping thread.  Here is my kit:

https://forums.servethehome.com/index.php?threads/new-sfp-n100-n305-router.44776/post-454348

the PCI-e slot is getting 4 PCI-lanes which should be plenty for a two port SFP+ card.

Do you have any insight on the massive queue drop numbers I am getting?
 

Quote from: Mark_the_Red on February 11, 2025, 11:20:17 PM[...]
the PCI-e slot is getting 4 PCI-lanes which should be plenty for a two port SFP+ card.

Making sure.

QuoteDo you have any insight on the massive queue drop numbers I am getting?

Offhand, no, but it looks like interrupt overload. Your drop counter looks like the direct queue - do you have RSS configured? (I did not configure "net.isr.dispatch" on my firewall or servers.) "netstat -Q" (for RSS) and "netstat -m" (buffer stats) may offer some insight. I suppose if all else fails you could crank the queue depth, but I think that would be of limited help with continuous traffic.

I wouldn't expect the hardware VLAN offload to make a positive difference in FreeBSD. (I have a bit more CPU than the N100, so YMMV.) In fact I would tend to trust it in DPDK applications only, but that might be a bit paranoid. You said you updated firmware - did you update the x710? It's been around since 2014, and the old firmwares are pretty bad. (If you do acquire a wild hare - so to speak - and decide to play with DPDK, e.g. DANOS/Vyatta or VPP, you'll need a recent NVM version.)

Appreciate the response.   

Regarding the x710-da2 - it is a Dell card with the firmware 22.5.7 which was from 01.2024. I flashed it myself and it is the latest. 

I enabled all the RSS, as well as tweaks from the opnsense / star wars walker webpage I linked earlier.

I guess I will just have to live with the status quo.   The crazy thing was that out of the box when I first installed this, I didn't have this queue issues problem, but that was on 24.12.4 version.  I think I changed everything in the tuneables and didn't properly isolate each variable, so I might have to start this from scratch, with factory settings.

Are you running OPNSENSE vanilla / factory with the stock tuneables (i.e., hardware offloading, vlan hardware, etc.)?   Do you think running Suricata / DPDK will make the x710 "kick in"?   I don't have strong opinions on using Suricata, but just want the best possible performance out of this and I am very new to Opnsense.  Came from ubiquiti edgerouter before this, and there were signficantly less "tuneables" than OPNSENSE.



 

Quote from: Mark_the_Red on February 12, 2025, 04:14:49 PM[...]
Are you running OPNSENSE vanilla / factory with the stock tuneables (i.e., hardware offloading, vlan hardware, etc.)?   Do you think running Suricata / DPDK will make the x710 "kick in"? [...]

I am not. I've set a bunch of sysctls similar to the linked page. Many are irrelevant to the purpose of the machine, but they shouldn't hurt. I'm also running a Ryzen 7700X with basic filters - a poor comparison.

An IPS is an add-on, so it will not improve performance - certainly not on a smaller machine.

Appreciate the help.  I just installed the latest updates 25.1 today, and completely enabled full hardware offload on the four main settings (pic)

Now its full 10gig results without missing a beat.  Ironically, these were the settings I started with.  Adding --bidir bogged it down to around 6-7gb/s but that was to be expected.   Not sure what was changed with the latest kernel (if any) but this seems to be resolved.

Didn't mean to waste your time, but something strange was up here.




Quote from: Mark_the_Red on February 12, 2025, 05:48:21 PM[...]
Didn't mean to waste your time, but something strange was up here.

Hardly a waste. Your experience is unusual - it could be useful at some point.

Not to necro an old thread.

I scratched my irrational impulse to upgrade to the latest OPNsense version  25.1->25.1.3 and this packet drop issue resurfaced.   Consistent loss of around 30% in iperf3 tests.

sysctl net.inet.ip.intr_queue_drops = 34112

I didn't set a backup point and realized I cannot roll back now.  Might have to fresh install again.

Something in the kernel from 25.1 is different from 25.1.3  I 'm not smart enough to figure out what it is.