Pretty simple setup here.
Running latest version of Opnsense in Vmware (7) installed Iperf in Opnsense and I have a standard debian VM connecting as the client, only getting the following speeds :
[ 5] 549.00-550.00 sec 71.2 MBytes 598 Mbits/sec 2 525 KBytes
[ 5] 550.00-551.00 sec 73.8 MBytes 619 Mbits/sec 0 621 KBytes
[ 5] 551.00-552.00 sec 71.2 MBytes 598 Mbits/sec 0 704 KBytes
[ 5] 552.00-553.00 sec 72.5 MBytes 608 Mbits/sec 2 567 KBytes
[ 5] 553.00-554.00 sec 76.2 MBytes 640 Mbits/sec 0 663 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-554.46 sec 41.0 GBytes 635 Mbits/sec 567 sender
[ 5] 0.00-554.46 sec 0.00 Bytes 0.00 bits/sec receiver
Network topology is quite straight forward
Test VM (Vlan50) ----> OPNsense VM (Trunk port, VLAN50)
I've read multiple threads that there are known performance issues with running in a virtualised environment with the VMXNET3 driver, is this still the case?
I don't need a lot of bandwidth but would have expected to see at least 1GB
FreeBSD 12 is still the limiting factor with VMX3 NICs. However, you can get better throughput by adding cores and making sure that the VM hardware given to the OPNsense router is on the VMware compatibility list.
With 2 cores, supported VM hardware, and openvmtools running on OPNsense I can see around 2gbit/sec throughput when testing internal transfers through the vSwitch. This is mainly limited by the CPU speed of the host, throwing more cores and/or faster CPU clock at the VM gets even higher throughput.
Since you have provided no details on the hosting environment, it's hard to say what else you should try. At the very least separate the iPerf client/server to different VMs, don't host one of the iPerf instances directly on OPNsense. Push/pull the traffic through OPNsense using a client/server VM setup sitting on each side of the LAN/WAN for the OPNsense VM.
If you have enough NICs available in the host and don't need vMotion for the OPNsense VM, try using NIC pass-through, that should yield much better results.
If its VMplayer with VMXNET3 on a PC like host, then 0.5Gbps in my recent tests:
https://forum.opnsense.org/index.php?topic=24713.msg119648#msg119648 (https://forum.opnsense.org/index.php?topic=24713.msg119648#msg119648)
From what I saw here, ESXi performs much better, namely with 10GbE setups.
Add a tunable:
hw.pci.honor_msi_blacklist=0
(See e.g. https://forum.netgate.com/topic/157688/remove-vmware-msi-x-from-the-pci-blacklist)
Thanks guys, I've had more time to have a play. If I have two FreeBSD VMs (v12 &v13) sitting on the same VLAN (same vswitch) it happily transfers at :
VM 1 VLAN 50
VM 2 VLAN 50
[ 5] 61.00-62.00 sec 3.07 GBytes 26.3 Gbits/sec 0 1.77 MBytes
[ 5] 62.00-63.00 sec 3.15 GBytes 27.1 Gbits/sec 0 1.77 MBytes
[ 5] 63.00-64.00 sec 2.93 GBytes 25.1 Gbits/sec 0 1.77 MBytes
[ 5] 64.00-65.00 sec 3.02 GBytes 25.9 Gbits/sec 0 1.77 MBytes
When I then do the same test, but have one of the VMs sitting on the other side of the FW so the traffic has to pass through Opnsense, I get :
VM1 VLAN 50
VM2 VLAN 76
[ 5] 174.00-175.00 sec 106 MBytes 885 Mbits/sec 60 575 KBytes
[ 5] 175.00-176.00 sec 108 MBytes 903 Mbits/sec 60 369 KBytes
[ 5] 176.00-177.00 sec 105 MBytes 881 Mbits/sec 0 666 KBytes
[ 5] 177.00-178.00 sec 106 MBytes 890 Mbits/sec 59 489 KBytes
[ 5] 178.00-179.00 sec 107 MBytes 895 Mbits/sec 43 198 KBytes
[ 5] 179.00-180.00 sec 104 MBytes 869 Mbits/sec 0 585 KBytes
[ 5] 180.00-181.00 sec 107 MBytes 895 Mbits/sec 59 375 KBytes
Linux shows the same speeds (Debian 11)
[ 5] 8.00-9.00 sec 2.68 GBytes 23.0 Gbits/sec 0 2.85 MBytes
[ 5] 9.00-10.00 sec 2.66 GBytes 22.9 Gbits/sec 0 3.02 MBytes
[ 5] 10.00-11.00 sec 2.77 GBytes 23.8 Gbits/sec 0 3.02 MBytes
[ 5] 11.00-12.00 sec 2.81 GBytes 24.2 Gbits/sec 0 3.02 MBytes
[ 5] 12.00-13.00 sec 2.77 GBytes 23.8 Gbits/sec 0 3.02 MBytes
^C[ 5] 13.00-13.28 sec 781 MBytes 23.8 Gbits/sec 0 3.02 MBytes
When I then do the same test, but have one of the VMs sitting on the other side of the FW so the traffic has to pass through Opnsense, I get :
[ 5] 4.00-5.00 sec 104 MBytes 870 Mbits/sec 0 701 KBytes
[ 5] 5.00-6.00 sec 102 MBytes 860 Mbits/sec 1 608 KBytes
[ 5] 6.00-7.00 sec 102 MBytes 860 Mbits/sec 0 725 KBytes
[ 5] 7.00-8.00 sec 106 MBytes 891 Mbits/sec 1 638 KBytes
[ 5] 8.00-9.00 sec 105 MBytes 881 Mbits/sec 2 539 KBytes
[ 5] 9.00-10.00 sec 102 MBytes 860 Mbits/sec 0 669 KBytes
It's a bit of a difference, the CPU on the Opnsense VM isn't getting stressed really either.
Any ideas on where else to look ?
I've added "hw.pci.honor_msi_blacklist=0" made no real difference.
Unless my testing is flawed, the common factor seems to be traffic passing through the FW.
Quote from: iBROX on October 12, 2021, 05:33:05 AM
I've added "hw.pci.honor_msi_blacklist=0" made no real difference.
Apologies - that tunable us only relevant if you pass through a physical NIC to opnsense I believe, so likely would not have effect on vmxnet3 performance.
here:
https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576 (https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576)
I've got info about
1) ibrs , which kinda make sense to be disabled in VM guest environment; and also here: https://docs.opnsense.org/troubleshooting/hardening.html#spectre-and-meltdown (https://docs.opnsense.org/troubleshooting/hardening.html#spectre-and-meltdown)
2) vmx driver queues should be a good tunable, but it seems to depend on VM host implementation , so my VMplayer VM didn't improve, however ESXi hosted should.
hmm, not the first time I see hw.pci.honor_msi_blacklist=0
for example and not sure what was his setup , but thinking about the queues again
https://forum.opnsense.org/index.php?topic=18754.msg90787#msg90787 (https://forum.opnsense.org/index.php?topic=18754.msg90787#msg90787)
older post , so older OPNsense, but seems to be encouraging
https://forum.opnsense.org/index.php?topic=18754.msg90722#msg90722 (https://forum.opnsense.org/index.php?topic=18754.msg90722#msg90722)
Thanks guys, massive improvement with the following settings applied in tuneables :
dev.vmx.0.iflib.override_nrxds = 0,2048,0
dev.vmx.0.iflib.override_ntxds = 0,4096
dev.vmx.1.iflib.override_nrxds = 0,2048,0
dev.vmx.1.iflib.override_ntxds = 0,4096
hw.ibrs_disable = 1
vm.pmap.pti = 0
Speeds went from :
[ 5] 174.00-175.00 sec 106 MBytes 885 Mbits/sec 60 575 KBytes
[ 5] 175.00-176.00 sec 108 MBytes 903 Mbits/sec 60 369 KBytes
[ 5] 176.00-177.00 sec 105 MBytes 881 Mbits/sec 0 666 KBytes
[ 5] 177.00-178.00 sec 106 MBytes 890 Mbits/sec 59 489 KBytes
[ 5] 178.00-179.00 sec 107 MBytes 895 Mbits/sec 43 198 KBytes
[ 5] 179.00-180.00 sec 104 MBytes 869 Mbits/sec 0 585 KBytes
[ 5] 180.00-181.00 sec 107 MBytes 895 Mbits/sec 59 375 KBytes
TO
[SUM] 49.00-50.00 sec 753 MBytes 6.31 Gbits/sec 0
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 50.00-50.55 sec 61.2 MBytes 941 Mbits/sec 0 747 KBytes
[ 7] 50.00-50.55 sec 46.2 MBytes 711 Mbits/sec 0 641 KBytes
[ 9] 50.00-50.55 sec 67.5 MBytes 1.04 Gbits/sec 0 840 KBytes
[ 11] 50.00-50.55 sec 36.2 MBytes 557 Mbits/sec 0 663 KBytes
[ 13] 50.00-50.55 sec 41.2 MBytes 634 Mbits/sec 0 691 KBytes
[ 15] 50.00-50.55 sec 36.2 MBytes 557 Mbits/sec 0 660 KBytes
[ 17] 50.00-50.55 sec 40.0 MBytes 615 Mbits/sec 0 677 KBytes
[ 19] 50.00-50.55 sec 35.0 MBytes 538 Mbits/sec 0 607 KBytes
[ 21] 50.00-50.55 sec 28.9 MBytes 444 Mbits/sec 0 527 KBytes
[ 23] 50.00-50.55 sec 38.8 MBytes 595 Mbits/sec 0 675 KBytes
[SUM] 50.00-50.55 sec 431 MBytes 6.63 Gbits/sec 0
Nice improvement! :)
I had this problem until I read about turning OFF "Disable hardware TCP segmentation offload" and "Disable hardware large receive offload" under Interface->Settings. This allowed me to get ~9GBit/s
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 1.34 GBytes 1.15 Gbits/sec receiver
[ 8] 0.00-10.00 sec 1.34 GBytes 1.15 Gbits/sec receiver
[ 10] 0.00-10.00 sec 1.34 GBytes 1.15 Gbits/sec receiver
[ 12] 0.00-10.00 sec 1.34 GBytes 1.15 Gbits/sec receiver
[ 14] 0.00-10.00 sec 1.34 GBytes 1.15 Gbits/sec receiver
[ 16] 0.00-10.00 sec 1.34 GBytes 1.15 Gbits/sec receiver
[ 18] 0.00-10.00 sec 1.44 GBytes 1.24 Gbits/sec receiver
[ 20] 0.00-10.00 sec 1.36 GBytes 1.17 Gbits/sec receiver
[SUM] 0.00-10.00 sec 10.9 GBytes 9.32 Gbits/sec receiver
Prior to making this change I had ~1-3Gbit/s
I read online that if you have Intel NICs that supports LRO/TSO VMWare enables that and when you disable it with FreeBSD it conflicts and slows down.