Poor Throughput (Even On Same Network Segment)

Started by hax0rwax0r, August 25, 2020, 08:31:25 PM

Previous topic - Next topic
For those interested, started a FreeBSD 13 Current VM (2020-oct-08), vmxnet3 interface, created one 802.1q vlan, and did some iperf between this guy and a Linux VM and, BOOM!. Full performance with 4 paralelism configured:

[ ID] Interval           Transfer     Bandwidth       Retr
[  5]   0.00-10.23  sec  2.34 GBytes  1.96 Gbits/sec    0             sender
[  5]   0.00-10.23  sec  2.34 GBytes  1.96 Gbits/sec                  receiver
[  7]   0.00-10.23  sec  2.09 GBytes  1.75 Gbits/sec    0             sender
[  7]   0.00-10.23  sec  2.09 GBytes  1.75 Gbits/sec                  receiver
[  9]   0.00-10.23  sec  1.67 GBytes  1.40 Gbits/sec    0             sender
[  9]   0.00-10.23  sec  1.67 GBytes  1.40 Gbits/sec                  receiver
[ 11]   0.00-10.23  sec  1.65 GBytes  1.39 Gbits/sec    0             sender
[ 11]   0.00-10.23  sec  1.65 GBytes  1.39 Gbits/sec                  receiver
[SUM]   0.00-10.23  sec  7.75 GBytes  6.50 Gbits/sec    0             sender
[SUM]   0.00-10.23  sec  7.75 GBytes  6.50 Gbits/sec                  receiver

Maybe this is some regression on 12.1.


> How did you do that? [force 1Gbps NIC]

Turn off auto negotiation and set the nic's IF to 1gbps (?)

You guys got me interested in this subject. I have tested plenty of iperf3 against my VMs in my little 3-host homelab, my 10GbE is just a couple DACs connected between the 10Gbe "backbone" IFs of my Dell Powerconnect 7048P, which is really more of a gigabit switch.

Usually the VMs will peg right up to ~9.4Gbps with little fluctuation if nothing else is happening, but I'm recording 3 720p video streams and 6 high-MP (4MP & 8MP) IP cameras right now, and have no interest in stopping any of it for testing right now.

I could have sworn I'd iperfed my OPNsense VM and gotten somewhere around 2.9Gbps vs the 9.4Gbps I got on my Linux, OmniOS or FreeBSD VMs (don't think I tested Windows, iperf3 is compiled weird in Win32 and doesn't yield predictable results).  So I expected it to be a bit slower, but not THIS much slower:

OPNsense 20.7.3 to OmniOS r151034
(on separate hosts)

This is a VM w/ 4 vCPU and 8GB ram, run on an E3-1230 v2 home-built Supermicro X9SPU-F host running ESXi 6.7U3.  The LAN vNIC is vmxnet3, running open-vm-tools.


root@gateway:/ # uname -a
FreeBSD gateway.webtool.space 12.1-RELEASE-p10-HBSD FreeBSD 12.1-RELEASE-p10-HBSD #0  517e44a00df(stable/20.7)-dirty: Mon Sep 21 16:21:17 CEST 2020     root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP  amd64

root@gateway:/ # iperf3 -c 192.168.1.56
Connecting to host 192.168.1.56, port 5201
[  5] local 192.168.1.1 port 13640 connected to 192.168.1.56 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   125 MBytes  1.05 Gbits/sec    0   2.00 MBytes       
[  5]   1.00-2.00   sec   126 MBytes  1.06 Gbits/sec    0   2.00 MBytes       
[  5]   2.00-3.00   sec   132 MBytes  1.11 Gbits/sec    0   2.00 MBytes       
[  5]   3.00-4.00   sec   131 MBytes  1.10 Gbits/sec    0   2.00 MBytes       
[  5]   4.00-5.00   sec   132 MBytes  1.11 Gbits/sec    0   2.00 MBytes       
[  5]   5.00-6.00   sec   135 MBytes  1.13 Gbits/sec    0   2.00 MBytes       
[  5]   6.00-7.00   sec   138 MBytes  1.16 Gbits/sec    0   2.00 MBytes       
[  5]   7.00-8.00   sec   137 MBytes  1.15 Gbits/sec    0   2.00 MBytes       
[  5]   8.00-9.00   sec   133 MBytes  1.12 Gbits/sec    0   2.00 MBytes       
[  5]   9.00-10.00  sec   131 MBytes  1.10 Gbits/sec    0   2.00 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.29 GBytes  1.11 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  1.29 GBytes  1.11 Gbits/sec                  receiver

iperf Done.


That is abysmal.  Compare that to this Bullseye VM going to same OmniOS VM (also on separate hosts)

Debian Bullseye to OmniOS r151034


avery@debbox:~$ uname -a
Linux debbox 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux

avery@debbox:~$ iperf3 -c 192.168.1.56
Connecting to host 192.168.1.56, port 5201
[  5] local 192.168.1.39 port 58064 connected to 192.168.1.56 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   688 MBytes  5.77 Gbits/sec    0   2.00 MBytes       
[  5]   1.00-2.00   sec   852 MBytes  7.15 Gbits/sec    0   2.00 MBytes       
[  5]   2.00-3.00   sec   801 MBytes  6.72 Gbits/sec  1825    730 KBytes       
[  5]   3.00-4.00   sec   779 MBytes  6.53 Gbits/sec   33   1.13 MBytes       
[  5]   4.00-5.00   sec   788 MBytes  6.61 Gbits/sec  266   1.33 MBytes       
[  5]   5.00-6.00   sec   828 MBytes  6.94 Gbits/sec  392   1.43 MBytes       
[  5]   6.00-7.00   sec   830 MBytes  6.96 Gbits/sec  477   1.49 MBytes       
[  5]   7.00-8.00   sec   826 MBytes  6.93 Gbits/sec  1286    749 KBytes       
[  5]   8.00-9.00   sec   826 MBytes  6.93 Gbits/sec    0   1.26 MBytes       
[  5]   9.00-10.00  sec   775 MBytes  6.50 Gbits/sec  278   1.38 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  7.81 GBytes  6.71 Gbits/sec  4557             sender
[  5]   0.00-10.00  sec  7.80 GBytes  6.70 Gbits/sec                  receiver

iperf Done.


So much better throughput. Even while that OmniOS VM is recording 8-9 streams of video over the network.

I'm going to install a FreeBSD kernel and see what happens.  Will be back with more benchmarks.


It is odd that so many of us seem to find an artificial ~1gbps limit when testing OPNsense 20.7 on VMware ESXi and vmxnet3 adapters. It looks like there's at least 3 of us that are able to re-produce these results now?

I've disable the hardware blacklist and did not see a difference in my test results from what I had posted here prior. The only way I can get a little bit better throughput is to add more vCPU to the OPNsense VM, however this does not scale well. For instance, if I go from 2vCPU to 4vCPU, I can start to get between 1.5gbps and 2.2gbps depending on how much parallelism I select on my iperf clients.

Quote from: opnfwb on October 22, 2020, 05:03:05 AM
It is odd that so many of us seem to find an artificial ~1gbps limit when testing OPNsense 20.7 on VMware ESXi and vmxnet3 adapters. It looks like there's at least 3 of us that are able to re-produce these results now?

I've disable the hardware blacklist and did not see a difference in my test results from what I had posted here prior. The only way I can get a little bit better throughput is to add more vCPU to the OPNsense VM, however this does not scale well. For instance, if I go from 2vCPU to 4vCPU, I can start to get between 1.5gbps and 2.2gbps depending on how much parallelism I select on my iperf clients.

I don't think it's related to the "hardware" (even though in this case, it's virtual).  I think it's the upstream regression mentioned on page 1 - since I used to get better speeds than this before I upgraded.  I think I did my last LAN-side iperf3 tests around v18 or 19, and they were at least twice that.  In fact, I'm fairly certain I doubled my vCPUs and ram since because I was testing Sensei and never re-configured it for 2 vCPU/4GB after I uninstalled it.

Quote from: opnfwb on October 22, 2020, 05:03:05 AM
It is odd that so many of us seem to find an artificial ~1gbps limit when testing OPNsense 20.7 on VMware ESXi and vmxnet3 adapters. It looks like there's at least 3 of us that are able to re-produce these results now?

I've disable the hardware blacklist and did not see a difference in my test results from what I had posted here prior. The only way I can get a little bit better throughput is to add more vCPU to the OPNsense VM, however this does not scale well. For instance, if I go from 2vCPU to 4vCPU, I can start to get between 1.5gbps and 2.2gbps depending on how much parallelism I select on my iperf clients.

Be honest to yourself, would you buy a piece of hardware with only 2 cores if you have to requirement for 10G? The smallest hardware with 10 interfaces has 4 core minimum.

Quote from: mimugmail on October 22, 2020, 07:27:38 AM
Be honest to yourself, would you buy a piece of hardware with only 2 cores if you have to requirement for 10G? The smallest hardware with 10 interfaces has 4 core minimum.
I think we may be talking past each other here. I'm not talking about purchasing hardware. I'm discussing a lack of throughput that now exists after an upgrade on hardware that performs at a much higher rate with just a software change. That's why we're running tests on multiple VMs, all with the same specs. There's obviously some bottleneck occurring here that isn't just explained away by core count (or lack thereof).

Quote from: mimugmail on October 19, 2020, 07:38:33 PM
I have customers pushing 6Gbit over vmxnet driver.
I'm more interested in trying to understand what is different in my environment that is causing these issues on VMs? Is this claimed 6Gbit going through a virtualized OPNsense install?. Do you have any additional details that we can check? I've even tried to change CPU core assignment (change number of sockets to 1, and add cores) to see if there was some weird NUMA scaling issue impacting OPNsense. So far everything I have tried to do has had no impact on throughput, even switching to the beta netmap kernel that is supposed to resolve some of this did not seem to work yet?

October 22, 2020, 07:46:38 PM #82 Last Edit: October 22, 2020, 07:59:59 PM by nwildner
Quote from: AveryFreeman on October 22, 2020, 04:36:49 AM
You guys got me interested in this subject. I have tested plenty of iperf3 against my VMs in my little 3-host homelab, my 10GbE is just a couple DACs connected between the 10Gbe "backbone" IFs of my Dell Powerconnect 7048P, which is really more of a gigabit switch.

The infrastructure i have on that remote office i was reporting so far:

- PowerEdge R630(2 servers)
- 2 Socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz with 12 cores each(24 cores per server)
- 3x NetXtreme II BCM57800 10 Gigabit Ethernet (dual port NIC), meaning 6 phisical adapters distributed into 3 virtual switches (2 nics vm, 2 nics vmotion, 2 nics vmkernel)
- 512GB Ram each server
- Plenty of storage on an external SAS 12Gbps(2x6Gbps active + 2x6GBbps passive paths) MD3xxx Dell storage with round-robin paths
- 2x  Dell N4032F as core/backbone switches with 10Gbps ports and stacked with 2x 40Gbps ports.
- 6 port trunks for each server. 3 ports per trunk per stacking member so, each vSwitch nic will touch one stack member
- Stack member on Dell N series are treated as a unity so, LACP can be configured across stack members(no MLAG involved).

Even when trying to transfer data between vms that were not registered on the same phisical hardware i can achieve 8Gbps easily, except with vmxnet3 driver from FreeBSD 12.1.

Quote from: mimugmail on October 22, 2020, 07:27:38 AM
Be honest to yourself, would you buy a piece of hardware with only 2 cores if you have to requirement for 10G? The smallest hardware with 10 interfaces has 4 core minimum.

What is not honest is to pretend that a VM cant push more than 1Gbps or achieve decent throughput rates while having only 1 vCPU configured, and that is not true. On the contrary, while doing virtualization you should always configure resources in a way that will avoid cpu oversubscription. Having for example a 4vCPUs VM that is mostly idle and does not run cpu intense operations will create problems to other vms on the same pool/share/physical hardware. For simple iperf3 and network transfer tests with FreeBSD 13 1vCPU did fine, while OPNSense(FreeBSD 12.1) with 4vCPU and high cpu shares being the only VM with that share configuration crawled during transfers.

Vmxnet3 on FreeBSD 12.1 is garbage. It seems that the port to iflib created some regressions related to MSI-X, tx/rx queues, iflib leaking MSI-x messages, non-power-of 2 tx/rx queue configs and others. I could even find some LRO regressions on commits that could explain retransmissions and the abismal lack of performance that i've reported here on a previous page while trying to enable LRO as a workaround for that performance issue. https://svnweb.freebsd.org/base/head/sys/dev/vmware/vmxnet3/if_vmx.c?view=log

The test i've made above with FreeBSD 13-CURRENT, i was only using 1vCPU, 4GB ram, pvscsi and vmxnet3 and the system performed greatly compared with the vmxnet3 driver state of the FreeBSD 12.1-RELEASE.

With proxmox using vnet adapter the speed is fine, but using pfsense based on freebsd 11 works fine with vmxnet3 too.
So the issue is with the HBSD and the vmxnet adapter. I dont understand why opnsense based on a half dead OS. HBSD is abandoned most of the devs. Just drop it and use the standard freebsd again.

Quote from: Archanfel80 on October 26, 2020, 10:27:47 AM
With proxmox using vnet adapter the speed is fine, but using pfsense based on freebsd 11 works fine with vmxnet3 too.
So the issue is with the HBSD and the vmxnet adapter. I dont understand why opnsense based on a half dead OS. HBSD is abandoned most of the devs. Just drop it and use the standard freebsd again.

FreeBSD 12.1 has the same issues ..

Quote from: mimugmail on October 26, 2020, 12:02:55 PM
Quote from: Archanfel80 on October 26, 2020, 10:27:47 AM
With proxmox using vnet adapter the speed is fine, but using pfsense based on freebsd 11 works fine with vmxnet3 too.
So the issue is with the HBSD and the vmxnet adapter. I dont understand why opnsense based on a half dead OS. HBSD is abandoned most of the devs. Just drop it and use the standard freebsd again.

FreeBSD 12.1 has the same issues ..

Yes, but the pfsense current stable branch still using freebsd 11.x not 12. I think they are on point. Not a good idea switching to a newer base OS if its still have many issues. Now i have to roll back to opnsense 20.1 everywhere where i upgraded to 20.7. And the issue is not just with the vmxnet. After i upgrade to 20.7 one of my hw firewall with EFI boot, the OS no longer boot but freezed during the EFI boot. Its also a freebsd 12 related issue, i already figured out.

And Sophos is using a 3.12 kernel, why upgrading to a newer one ..

If noone does the first step there wouldn't be any progress. Usually mission critical systems shouldnt be updated to a major release when not on .3 or .4. I'd even wait till a .6.

The whole discussion is way too offtopic and only updated with frustrated content.

It should talk about this, so maybe offtopic but still.
Half year release model, so im updated since recently, 20.7 is almost half year old now, we are close to the 21.1 now, when 20.7 will be obsolate too. You're right about that a critical system software should wait for adapting new releases. So even the 21.x series should use freebsd 11 and wait for upgrading to 12 until it will be stable. A firewall is not a good place to experiencing and making the first step.

But i can say something what is not offtopic.
Disabling net.inet.ip.redirect and net.inet.ip6.redirect, increasing net.inet.tcp.recvspace and net.inet.tcp.sendspace also kern.ipc.maxsockbuf and kern.ipc.somaxconn helps a little. Still have perfomance lost but not that bad.
I attached my tunables related config.

Just keep using 20.1 with all the security related caveats and missing features. I really don't see the point in complaining about user choices.


Cheers,
Franco

Quote from: franco on October 26, 2020, 02:08:45 PM
Just keep using 20.1 with all the security related caveats and missing features. I really don't see the point in complaining about user choices.


Cheers,
Franco

I did rollback, everything is fine. The network speed is around 800mbit again (gigabit internet), with 20.7 this was just 500-600mbit. Speed is important here, i dont care about missing features i dont use any. Im not sure about the security caveats. freebsd 11 is no less secure. Until this issue not fixed i stay with 20.1.x. This servers used in production enviroment, i dont have time and oppurtunity to use these as a playground. This was exactly the same reason why i abandon using pfsense. They importing untested kernels and features and the core system become unstable and after an upgrade i have fears what will gone wrong. Opnsense did right for now, i hope the devs fix this or at least we have some workaround. The speed is not the only issue. I have to disable IPS/IDS and sensei too because its cause system freeze. I basicly neglected my firewalls. I know this is still in testing phase but 20.7 is 4 almost 5 months old now and still unable to use this features properly. And we paid for the sensei which is unusable now. This is not acceptable. So yes, i take the "risk" and did rollback wherever i can...