Iperf speeds slow (Vmware environment)

Started by iBROX, October 08, 2021, 06:01:40 AM

Previous topic - Next topic
Pretty simple setup here.

Running latest version of Opnsense in Vmware (7) installed Iperf in Opnsense and I have a standard debian VM connecting as the client, only getting the following speeds :

[  5] 549.00-550.00 sec  71.2 MBytes   598 Mbits/sec    2    525 KBytes
[  5] 550.00-551.00 sec  73.8 MBytes   619 Mbits/sec    0    621 KBytes
[  5] 551.00-552.00 sec  71.2 MBytes   598 Mbits/sec    0    704 KBytes
[  5] 552.00-553.00 sec  72.5 MBytes   608 Mbits/sec    2    567 KBytes
[  5] 553.00-554.00 sec  76.2 MBytes   640 Mbits/sec    0    663 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-554.46 sec  41.0 GBytes   635 Mbits/sec  567             sender
[  5]   0.00-554.46 sec  0.00 Bytes  0.00 bits/sec                  receiver

Network topology is quite straight forward

Test VM (Vlan50) ----> OPNsense VM (Trunk port, VLAN50)

I've read multiple threads that there are known performance issues with running in a virtualised environment with the VMXNET3 driver, is this still the case?

I don't need a lot of bandwidth but would have expected to see at least 1GB

FreeBSD 12 is still the limiting factor with VMX3 NICs. However, you can get better throughput by adding cores and making sure that the VM hardware given to the OPNsense router is on the VMware compatibility list.

With 2 cores, supported VM hardware, and openvmtools running on OPNsense I can see around 2gbit/sec throughput when testing internal transfers through the vSwitch. This is mainly limited by the CPU speed of the host, throwing more cores and/or faster CPU clock at the VM gets even higher throughput.

Since you have provided no details on the hosting environment, it's hard to say what else you should try. At the very least separate the iPerf client/server to different VMs, don't host one of the iPerf instances directly on OPNsense. Push/pull the traffic through OPNsense using a client/server VM setup sitting on each side of the LAN/WAN for the OPNsense VM.

If you have enough NICs available in the host and don't need vMotion for the OPNsense VM, try using NIC pass-through, that should yield much better results.

If its VMplayer with VMXNET3 on a PC like host, then 0.5Gbps in my recent tests:

https://forum.opnsense.org/index.php?topic=24713.msg119648#msg119648

From what I saw here, ESXi performs much better, namely with 10GbE setups.



Thanks guys, I've had more time to have a play.  If I have two FreeBSD VMs (v12 &v13) sitting on the same VLAN (same vswitch) it happily transfers at :

VM 1 VLAN 50
VM 2 VLAN 50

[  5]  61.00-62.00  sec  3.07 GBytes  26.3 Gbits/sec    0   1.77 MBytes
[  5]  62.00-63.00  sec  3.15 GBytes  27.1 Gbits/sec    0   1.77 MBytes
[  5]  63.00-64.00  sec  2.93 GBytes  25.1 Gbits/sec    0   1.77 MBytes
[  5]  64.00-65.00  sec  3.02 GBytes  25.9 Gbits/sec    0   1.77 MBytes

When I then do the same test, but have one of the VMs sitting on the other side of the FW so the traffic has to pass through Opnsense, I get :

VM1 VLAN 50
VM2 VLAN 76

[  5] 174.00-175.00 sec   106 MBytes   885 Mbits/sec   60    575 KBytes
[  5] 175.00-176.00 sec   108 MBytes   903 Mbits/sec   60    369 KBytes
[  5] 176.00-177.00 sec   105 MBytes   881 Mbits/sec    0    666 KBytes
[  5] 177.00-178.00 sec   106 MBytes   890 Mbits/sec   59    489 KBytes
[  5] 178.00-179.00 sec   107 MBytes   895 Mbits/sec   43    198 KBytes
[  5] 179.00-180.00 sec   104 MBytes   869 Mbits/sec    0    585 KBytes
[  5] 180.00-181.00 sec   107 MBytes   895 Mbits/sec   59    375 KBytes 

Linux shows the same speeds (Debian 11)

[  5]   8.00-9.00   sec  2.68 GBytes  23.0 Gbits/sec    0   2.85 MBytes
[  5]   9.00-10.00  sec  2.66 GBytes  22.9 Gbits/sec    0   3.02 MBytes
[  5]  10.00-11.00  sec  2.77 GBytes  23.8 Gbits/sec    0   3.02 MBytes
[  5]  11.00-12.00  sec  2.81 GBytes  24.2 Gbits/sec    0   3.02 MBytes
[  5]  12.00-13.00  sec  2.77 GBytes  23.8 Gbits/sec    0   3.02 MBytes
^C[  5]  13.00-13.28  sec   781 MBytes  23.8 Gbits/sec    0   3.02 MBytes   

When I then do the same test, but have one of the VMs sitting on the other side of the FW so the traffic has to pass through Opnsense, I get :

[  5]   4.00-5.00   sec   104 MBytes   870 Mbits/sec    0    701 KBytes
[  5]   5.00-6.00   sec   102 MBytes   860 Mbits/sec    1    608 KBytes
[  5]   6.00-7.00   sec   102 MBytes   860 Mbits/sec    0    725 KBytes
[  5]   7.00-8.00   sec   106 MBytes   891 Mbits/sec    1    638 KBytes
[  5]   8.00-9.00   sec   105 MBytes   881 Mbits/sec    2    539 KBytes
[  5]   9.00-10.00  sec   102 MBytes   860 Mbits/sec    0    669 KBytes   


It's a bit of a difference, the CPU on the Opnsense VM isn't getting stressed really either.

Any ideas on where else to look ?

I've added "hw.pci.honor_msi_blacklist=0" made no real difference.

Unless my testing is flawed, the common factor seems to be traffic passing through the FW.

Quote from: iBROX on October 12, 2021, 05:33:05 AM
I've added "hw.pci.honor_msi_blacklist=0" made no real difference.
Apologies - that tunable us only relevant if you pass through a physical NIC to opnsense I believe, so likely would not have effect on vmxnet3 performance.

here:
https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576
I've got info about
1) ibrs , which kinda make sense to be disabled in VM guest environment; and also here: https://docs.opnsense.org/troubleshooting/hardening.html#spectre-and-meltdown
2) vmx driver queues should be a good tunable, but it seems to depend on VM host implementation , so my VMplayer VM didn't improve, however ESXi hosted should.

hmm, not the first time I see hw.pci.honor_msi_blacklist=0
for example and not sure what was his setup , but thinking about the queues again
https://forum.opnsense.org/index.php?topic=18754.msg90787#msg90787

older post , so older OPNsense, but seems to be encouraging
https://forum.opnsense.org/index.php?topic=18754.msg90722#msg90722



Thanks guys, massive improvement with the following settings applied in tuneables :

dev.vmx.0.iflib.override_nrxds = 0,2048,0
dev.vmx.0.iflib.override_ntxds = 0,4096
dev.vmx.1.iflib.override_nrxds = 0,2048,0
dev.vmx.1.iflib.override_ntxds = 0,4096
hw.ibrs_disable = 1
vm.pmap.pti = 0

Speeds went from :

[  5] 174.00-175.00 sec   106 MBytes   885 Mbits/sec   60    575 KBytes
[  5] 175.00-176.00 sec   108 MBytes   903 Mbits/sec   60    369 KBytes
[  5] 176.00-177.00 sec   105 MBytes   881 Mbits/sec    0    666 KBytes
[  5] 177.00-178.00 sec   106 MBytes   890 Mbits/sec   59    489 KBytes
[  5] 178.00-179.00 sec   107 MBytes   895 Mbits/sec   43    198 KBytes
[  5] 179.00-180.00 sec   104 MBytes   869 Mbits/sec    0    585 KBytes
[  5] 180.00-181.00 sec   107 MBytes   895 Mbits/sec   59    375 KBytes 

TO

[SUM]  49.00-50.00  sec   753 MBytes  6.31 Gbits/sec    0
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]  50.00-50.55  sec  61.2 MBytes   941 Mbits/sec    0    747 KBytes
[  7]  50.00-50.55  sec  46.2 MBytes   711 Mbits/sec    0    641 KBytes
[  9]  50.00-50.55  sec  67.5 MBytes  1.04 Gbits/sec    0    840 KBytes
[ 11]  50.00-50.55  sec  36.2 MBytes   557 Mbits/sec    0    663 KBytes
[ 13]  50.00-50.55  sec  41.2 MBytes   634 Mbits/sec    0    691 KBytes
[ 15]  50.00-50.55  sec  36.2 MBytes   557 Mbits/sec    0    660 KBytes
[ 17]  50.00-50.55  sec  40.0 MBytes   615 Mbits/sec    0    677 KBytes
[ 19]  50.00-50.55  sec  35.0 MBytes   538 Mbits/sec    0    607 KBytes
[ 21]  50.00-50.55  sec  28.9 MBytes   444 Mbits/sec    0    527 KBytes
[ 23]  50.00-50.55  sec  38.8 MBytes   595 Mbits/sec    0    675 KBytes
[SUM]  50.00-50.55  sec   431 MBytes  6.63 Gbits/sec    0         


Nice improvement! :)

I had this problem until I read about turning OFF "Disable hardware TCP segmentation offload" and "Disable hardware large receive offload" under Interface->Settings. This allowed me to get ~9GBit/s


[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  1.34 GBytes  1.15 Gbits/sec                  receiver
[  8]   0.00-10.00  sec  1.34 GBytes  1.15 Gbits/sec                  receiver
[ 10]   0.00-10.00  sec  1.34 GBytes  1.15 Gbits/sec                  receiver
[ 12]   0.00-10.00  sec  1.34 GBytes  1.15 Gbits/sec                  receiver
[ 14]   0.00-10.00  sec  1.34 GBytes  1.15 Gbits/sec                  receiver
[ 16]   0.00-10.00  sec  1.34 GBytes  1.15 Gbits/sec                  receiver
[ 18]   0.00-10.00  sec  1.44 GBytes  1.24 Gbits/sec                  receiver
[ 20]   0.00-10.00  sec  1.36 GBytes  1.17 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  10.9 GBytes  9.32 Gbits/sec                  receiver


Prior to making this change I had ~1-3Gbit/s

I read online that if you have Intel NICs that supports LRO/TSO VMWare enables that and when you disable it with FreeBSD it conflicts and slows down.