10Gbe massive performance issues *SOLVED*

Started by neeeezy, February 23, 2023, 12:18:25 PM

Previous topic - Next topic
February 23, 2023, 12:18:25 PM Last Edit: March 14, 2023, 12:07:37 PM by neeeezy
Hi everyone,

so before I go into detail what kind of issues I have to deal with here is what I have in use:
OPNsense    22.7.11-amd64
FreeBSD      13.1-RELEASE-p5
OpenSSL     1.1.1s 1 Nov 2022
CPU type     AMD EPYC 7232P 8-Core Processor (8 cores, 16 threads)
Mainboard   H12SSW-NT
NIC             2x 10GBase-T LAN Ports via Broadcom BCM57416

So the company I work for had an older Watchguard FW in use and I replaced it with a physical appliance with OPNsense. SO far so good everything is working equivalently. Now the actual purpose of replacing the FW was to switch to 10Gbe (the older Watchguard model was not 10Gbe capable) and this is where the issues begin.
The hosting centre supplied us with an additional cable with a 10Gbe connection.
With the 1Gbe connection that we are using right now I get connection speeds of ~900Mps UP and DOWN, which is about what I would expect.
After trying to switch to the 10Gbe connection the downstream was HORRIFIC! It was executed from the same VM that I tested the ~900Mps before and it is running on a 10Gbe capable hypervisor).
SO I would expect at least MORE than with 1Gbe but somehow I am getting a whopping 30Mbps DOWN and about the same as before for UP. (I tried with multiple servers/devices and the speed was always the same, extraordinary slow DOWN and about 1Gbe UP.

In the interface overview it shows that the interface is 10Gbase-T full duplex. Speed for the interface is autoselect as the mode is auto negotiate for the hosting centre aswell. Next I would try setting it to 10Gbe fixed on our side and coordinate with the hosting centre support so that they also set it to a fixed speed, but I have my troubles beliving this to be cause of the issue since the interface seems to get it right with autoselect/autonegotiation(see attachment).

Do any of you have encountered similar issues already? I would highly appreciate any input on this.

Thanks in advance
Nicolas


February 23, 2023, 07:00:00 PM #1 Last Edit: February 23, 2023, 07:01:58 PM by meyergru
At 10 Gbe speed, there are a lot of pitfalls. What I do not get from your post:


  • Is OpnSense running virtualised or on hardware? If virtualised: as a pass-thru or full virtual?
  • How do you measure speed and from where? (running iperf on the OpnSense itself gives different numbers than running through it)
  • You know that with 10 Gbe and higher, you need multiple threads to saturate the connection?
  • Did you enable RSS?

Besides: There are plenty of reports around the BCM57416 having abysmal performance because of one problem or another (firmware, RDMA and others).
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

First of all thank you for your answer I appreciate it!
To answer your questions:

  • It is running on hardware, the one stated above.
  • I first measured the speed with a VM that is running on a Hypervisor to speedtest.net. When using the 1Gbe cable I got about 900Mbps UP and DOWN using this method. After I switched to the 10Gbe cable connection the upstream remained the same of about 900 but the downstream got horrific to 20-40Mbps.
  • I use an AMD EPYC with 8 Cores. Is there a setting that I need to set in order for OPNsense to run multithreaded? Still I would assume a little bit of throttling if this was a CPU limitation and not a sudden and permanent drop to 20-40Mbps.
  • I tried after I saw your post but I could not find the setting under 'tunables'. Could you please provide help on how to enable it?
Edit I also tried setting the speed to 10Gbe fixed on the interface in OPNsense but it did not change anything.
I haven't found anyone that had such an abyssmal performance maybe ~3Gbe instead of ~10Gbe but not WORSE than 1Gbe.

So you have installed OPN on that physical server, not a Virtual machine inside it, right? Just to be clear on variables.
Then you need to have a better baseline. The testing through it is from a VM on another hypervisor. Check with another but otherwise make sure is not what is throwing you off course. Unlikely since it was 900 mbps before, but worth double checking.
Then back on OPN. Tunables are in System > Settings > Tunables. But don't change anything yet, until you know what you want to change in a controlled way.
What do you get from "ifconfig" ? You can hide the public ip before posting it here. It'll be interesting to see the driver in use and the media identified.

Quote from: cookiemonster on February 24, 2023, 10:58:26 AM
So you have installed OPN on that physical server, not a Virtual machine inside it, right? Just to be clear on variables.
Then you need to have a better baseline. The testing through it is from a VM on another hypervisor. Check with another but otherwise make sure is not what is throwing you off course. Unlikely since it was 900 mbps before, but worth double checking.
Then back on OPN. Tunables are in System > Settings > Tunables. But don't change anything yet, until you know what you want to change in a controlled way.
What do you get from "ifconfig" ? You can hide the public ip before posting it here. It'll be interesting to see the driver in use and the media identified.
Yes it is installed on a physical server, I only used the virtual machine to simulate traffic that one of our servers will produce.
I know it is not the perfect test scenario but since we are not talking about a slight throttle but instead a massive downgrade from 1Gbe to 10Gbe I would eradicate the testing method as source of error.
I also used my laptop (which is not 10Gbe capable but 1Gbe) to test it and here I got the same results.
Alright so I will wait with the tunables. I have the screenshot below. Right now I switched back to the 1Gbe cable so the interface that is connected to it also only show 1Gbe. I checked ifconfig as well as Interfaces/Overview though when I had the 10Gbe cable connected and confirmed that it showed the same (10Gbase-T <full-duplex,rxpause,txpause>) as for bnxt1.
(added the screenshot I took when I had the WAN interface connected to the 10Gbe cable)

February 24, 2023, 11:20:27 AM #5 Last Edit: February 24, 2023, 11:26:01 AM by meyergru
What I meant by multiple threads is the '-P' option on iperf (if that is what you use) which you absolutely need to measure speeds above gigabit. A single TCP connection will not be sufficient to test at those speeds.

RSS is in tuneables, but depends on actual driver support for your NICs and the syntax depends on that. I do not use Broadcom, so I cannot help with that.

There may be other driver-level settings for your NIC that affect buffer memory, queue sizes and others which come into mind when there are problems on the receiving end (one prominent example would be flow control, which could severely impact receive speed).

P.S.: You actually have rxpause and txpause enabled. That might be it. I do not know how to disable flow control on Broadcom under FreeBSD. Intel uses tuneables: https://lists.freebsd.org/pipermail/freebsd-net/2012-July/032868.html

Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

You are on the verge IMHO of having to investigate tunables for your nic's driver:
https://man.freebsd.org/cgi/man.cgi?query=bnxt
I'm not saying go changing things yet, until you are sarisfied the out-of-the-box settings are not ideal.
Check with mimugmail's notes on how he tests for performance https://www.routerperformance.net/opnsense/opnsense-performance-20-1-8/

Quote from: meyergru on February 24, 2023, 11:20:27 AM
What I meant by multiple threads is the '-P' option on iperf (if that is what you use) which you absolutely need to measure speeds above gigabit. A single TCP connection will not be sufficient to test at those speeds.

RSS is in tuneables, but depends on actual driver support for your NICs and the syntax depends on that. I do not use Broadcom, so I cannot help with that.

There may be other driver-level settings for your NIC that affect buffer memory, queue sizes and others which come into mind when there are problems on the receiving end (one prominent example would be flow control, which could severely impact receive speed).

P.S.: You actually have rxpause and txpause enabled. That might be it. I do not know how to disable flow control on Broadcom under FreeBSD. Intel uses tuneables: https://lists.freebsd.org/pipermail/freebsd-net/2012-July/032868.html



Thanks for responding. Yeah one TCP connection may not be able to test those speeds but if I managed to get about 1Gbe before I should come at least to the same value and not 20Mbps instead of 900. I think my tests are enough to prove that something is completely OFF or malfunctioning apart from that I do not use the perfect testing methodology.

So it is not recommended to have rxpause and txpause enabled? This is something I did not touch yet thanks for that tip I will do some research on how to disable it for broadcom NICs. I have an identical hardware running OPNsense in the office with the same broadcom network card and I just checked rxpause and txpause are enabled aswell on this one. The FW in the office I did perform iperf3 testing with 10 threads through the firewall a while ago and it was physically capable to get about 10Gbe speeds - so as much as I hope that having rxpause and txpause enabled is the cause of the issue I think I pretty much proved that the device was physically capable of getting a ~10Gbe througput although having this feature enabled.

Quote from: cookiemonster on February 24, 2023, 11:38:59 AM
You are on the verge IMHO of having to investigate tunables for your nic's driver:
https://man.freebsd.org/cgi/man.cgi?query=bnxt
I'm not saying go changing things yet, until you are sarisfied the out-of-the-box settings are not ideal.
Check with mimugmail's notes on how he tests for performance https://www.routerperformance.net/opnsense/opnsense-performance-20-1-8/
Thanks for the material I will work through this and see if I can find something that makes a change for me.
Cheers

February 24, 2023, 11:47:10 AM #9 Last Edit: February 24, 2023, 12:14:43 PM by meyergru
Yup. For flow control, it seems to be dev.bnxt.X.fc:

        Enable / Disable Flow Control feature.  Defaults to Enable

There is a good chance that setting this to 0 for the Broadcom adapters will solve the problem. I do not see how it could become worse, either.

Flow control can have a negative impact when combined with small buffer sizes, because then, most of the time, there will be short pauses in the stream, which is something that TCP handles automatically, so there is no need for it anyway. Also, it depends on both sides of the connection correctly interpreting it, which could account for the other device working correctly with another link partner.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Quote from: meyergru on February 24, 2023, 11:20:27 AM
What I meant by multiple threads is the '-P' option on iperf (if that is what you use) which you absolutely need to measure speeds above gigabit. A single TCP connection will not be sufficient to test at those speeds.

I just did an iperf3 without -P and hit 9gbit, so I don't think that's true.

UPDATE:
So we have an identical baremetal firewall which I installed OPNsense on and imported the config.
For whatever reason we did not face the issue with this one even though specs-wise they should be identical, same broadcom network adapters etc.

I must say I did not really learn anything from this except that it worked for one and the other one had weird performance issues - a little bit frustrating but also great I don't have to deal with these issues anymore. *phew*
Thanks for everyone's answers and tips nonetheless!
Have a great day.

did you use the same 10G card on this server which worked ? Like taking the 10G card out from the server and putting in to this server ? I am just wondering if there was something with the firmware of the 10G card ?