HELP NEEDED: Performance issue on DEC850 after upgrade to 24.7

Started by svengru, July 25, 2024, 08:59:14 PM

Previous topic - Next topic
I see a noticeable performance impact on 10G routing after upgrading my DEC850 to 24.7. Going back to 24.1 solves the issue.

Setup:
I am on a 10G internet connection and use a DEC850. AX0 is the WAN port and AX1 is used as LAN. AX1 has several v-lans. No other ports are in use.

The recommended tuning parameters to get 10G performance are set
dev.ax.0.iflib.override_nrxds = 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048
dev.ax.0.iflib.override_ntxds = 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048
dev.ax.0.rss_enabled = 1

Same settings for ax.1 as well.

Issue:
On prior release (including 24.1) is was able to get "line" performance while having relatively low CPU load on the 8 cores the DEC850 has. Since upgrading to 24.7 the network performance is inconsistent and no longer line performance. CPU core spike at 100% during transfer tests.

Test to repro
Using a device connected to the LAN, run a moderate network load to a WAN target that can provide consistent performance close to 10G line speed. For my example I used "iperf3 -c speedtest.sea11.us.leaseweb.net -p 5201-5210 -P10" which normally results in about 9.4 to 9.3 Gbits/sec and a well distributed load across CPU cores. Screenshot attached (24.1.jpg)

Running the same test on 24.7 results in only 8.8 Gbits/sec down and ~5 Gbits/sec up while maxing out most CPU cores. Screenshot attached (24.7.jpg).

I have two DEC850 units so that I was able to do a clean install on both for the test and apply the exact same configuration. 

Is anybody else seeing this with 10G or higher speed WAN connections?

I see similar numbers. The Speed dropped from 9.6 to max 5.4 after the Upgrade to 24.7.

I debugged this some more. There seems to be a change in how FreeBSD handles network load scheduling across cores. Can you try to set "net.inet.rss.enabled" to "1" in tunables and test again? I saw a noticeable improvement (back to line speed) after setting it.

September 16, 2024, 10:42:23 PM #3 Last Edit: September 17, 2024, 01:30:23 AM by svengru
I just went to 24.4 to 24.7 again for a test of the performance in a real production environment as I was no longer seeing performance issues on 24.7 with RSS enabled in my lab.

Unfortunately took less than 24h for the performance to drop from 9.4gig (expected and normal using 24.4) down to less than 8G down and 6G upload.

I really cannot put my finger on what is causing the issue.
Take the exact same config file on 24.4 and no performance issues. A reboot of 24.7 restores the performance for a few hours before it degrades again.

I understand that there are likely very few people in this forum that run 24.7 in a 10gig WAN environment and on a DEC850 or any other A20 based (axgbe NIC) based system but would like to hear if somebody solved this or sees no performance issues.

Let's figure this out before a potential bug makes it into the business edition. This is likely a FreeBSD related issue but I hope that a few of you are willing to work with me on figuring this out.

EDIT: A few more observations and relevant details:
1. My 10Gig WAN has a fix IPv4 that is used with NAT and a fixed /56 IPv6 subnet not used with NAT. Essentially a direct BIDI connection that goes straight into my ISP's back-end router.
2. The issues happens with both IPv4 and IPv6 traffic ruling out issues with NAT.

Thanks,
Sven

I've noticed a bizarre behavior (DEC3840 here)...

All HW offload is ON but VLAN.

By bringing down and then back up ax0 via ifconfig, I achieve clean 9.4G UP/DOWN

If I reboot, I need to run that command through shell again otherwise I'm capped to 4.5G UP/DOWN


Quote from: NW4FUN on November 07, 2024, 03:59:40 PM
I've noticed a bizarre behavior (DEC3840 here)...

All HW offload is ON but VLAN.

By bringing down and then back up ax0 via ifconfig, I achieve clean 9.4G UP/DOWN

If I reboot, I need to run that command through shell again otherwise I'm capped to 4.5G UP/DOWN

Just upgraded my Qotom Q20332G9-S10 (Intel Atom C3758) running Proxmox 7.4 from 24.1 to 24.7 and noticed I wasn't getting 2.5G line speed as before.

Also had 100% CPU saturation during iperf and speedtests.

Tried `ifconfig <ix> down && ifconfig <ix> up` on the LAN and WAN interfaces and I get (mostly) expected line speed. Seemed to be stuck on 1G speeds beforehand, even though ifconfig reported 10Gbase was autonegotiated.

There must be something worth investigating in...
I look forward to hearing from the likes of @franco