Poor Throughput (Even On Same Network Segment)

Started by hax0rwax0r, August 25, 2020, 08:31:25 PM

Previous topic - Next topic
Quote from: FlightService on October 19, 2020, 10:42:26 AM
How did you set your NIC to do that?

I should clarify that my server is a Linux installation (Debian Buster) which runs on dedicated hardware. There are several discussions regarding issues of intel NICs and Linux. It doesn't matter whether the server is directly connected to a client or a switch in between. The NIC driver often reports a connection of 10MBit/s to the system, although the real performance was more than the reported speed. On the Linux machine I used "ethtool <dev> speed 1000" to disable autonegotiation.
OPNsense 24.7.11_2-amd64

October 19, 2020, 06:58:56 PM #61 Last Edit: October 19, 2020, 07:29:01 PM by nwildner
Quote from: mimugmail on October 13, 2020, 07:20:17 AM
It's under investigation, 20.7.4 May bring an already fixed kernel

Just to add more info on this topic: vmxnet3 can't handle more than 1Gbps while traffic testing OPNSense to Windows(and reverse mode iperf) on the same vlan. It's a big hit since our users frequently access fileserver and PDM(autocad-like) data that are on different vlans(and thus, all traffic is forwarded by OPNSense). All network is 10Gbps including user workstations and esxi 6.7u3 servers.

We have noticed a big hit on transfer speeds after changing our firewall vendor to OPNSense on that location and we believe that it relates to this vmxnet3 case.

OPNSense VM Specs:

  • 4 vCPU - Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  • 8 GB RAM
  • vmxnet3 attached to a vSwitch with 2 10Gbp/s - QLogic Corporation NetXtreme II BCM57800(broadcom Dell OEM).
  • 10 vlans

OPNSense and Windows server, same vlan, opnsense as gateway of this server vlan:
OPNSENSE to WINDOWS:
iperf3 -c 10.254.win.ip -P 8 -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.02  sec   118 MBytes  98.8 Mbits/sec    0             sender
[  5]   0.00-10.02  sec   118 MBytes  98.8 Mbits/sec                  receiver
[  7]   0.00-10.02  sec   116 MBytes  96.8 Mbits/sec    0             sender
[  7]   0.00-10.02  sec   116 MBytes  96.8 Mbits/sec                  receiver
[  9]   0.00-10.02  sec   113 MBytes  94.5 Mbits/sec    0             sender
[  9]   0.00-10.02  sec   113 MBytes  94.5 Mbits/sec                  receiver
[ 11]   0.00-10.02  sec   109 MBytes  91.5 Mbits/sec    0             sender
[ 11]   0.00-10.02  sec   109 MBytes  91.5 Mbits/sec                  receiver
[ 13]   0.00-10.02  sec   107 MBytes  89.7 Mbits/sec    0             sender
[ 13]   0.00-10.02  sec   107 MBytes  89.7 Mbits/sec                  receiver
[ 15]   0.00-10.02  sec  99.8 MBytes  83.5 Mbits/sec    0             sender
[ 15]   0.00-10.02  sec  99.8 MBytes  83.5 Mbits/sec                  receiver
[ 17]   0.00-10.02  sec  82.0 MBytes  68.7 Mbits/sec    0             sender
[ 17]   0.00-10.02  sec  82.0 MBytes  68.7 Mbits/sec                  receiver
[ 19]   0.00-10.02  sec  71.2 MBytes  59.6 Mbits/sec    0             sender
[ 19]   0.00-10.02  sec  71.2 MBytes  59.6 Mbits/sec                  receiver
[SUM]   0.00-10.02  sec   816 MBytes   683 Mbits/sec    0             sender
[SUM]   0.00-10.02  sec   816 MBytes   683 Mbits/sec                  receiver


OPNSENSE to WINDOWS(iperf3 reverse mode):
iperf3 -c 10.254.win.ip -P 8 -R -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  88.4 MBytes  74.1 Mbits/sec                  sender
[  5]   0.00-10.00  sec  88.2 MBytes  74.0 Mbits/sec                  receiver
[  7]   0.00-10.00  sec   118 MBytes  98.7 Mbits/sec                  sender
[  7]   0.00-10.00  sec   117 MBytes  98.5 Mbits/sec                  receiver
[  9]   0.00-10.00  sec  91.9 MBytes  77.1 Mbits/sec                  sender
[  9]   0.00-10.00  sec  91.7 MBytes  76.9 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec  91.6 MBytes  76.9 Mbits/sec                  sender
[ 11]   0.00-10.00  sec  91.5 MBytes  76.7 Mbits/sec                  receiver
[ 13]   0.00-10.00  sec  92.6 MBytes  77.7 Mbits/sec                  sender
[ 13]   0.00-10.00  sec  92.4 MBytes  77.5 Mbits/sec                  receiver
[ 15]   0.00-10.00  sec  94.4 MBytes  79.2 Mbits/sec                  sender
[ 15]   0.00-10.00  sec  94.2 MBytes  79.0 Mbits/sec                  receiver
[ 17]   0.00-10.00  sec   100 MBytes  84.3 Mbits/sec                  sender
[ 17]   0.00-10.00  sec   100 MBytes  84.1 Mbits/sec                  receiver
[ 19]   0.00-10.00  sec  99.9 MBytes  83.8 Mbits/sec                  sender
[ 19]   0.00-10.00  sec  99.6 MBytes  83.6 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec   777 MBytes   652 Mbits/sec                  sender
[SUM]   0.00-10.00  sec   775 MBytes   650 Mbits/sec                  receiver


Linux VM Specs:

  • 1 vCPU - Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  • 4 GB RAM
  • vmxnet3 attached to a vSwitch with 2 10Gbp/s - QLogic Corporation NetXtreme II BCM57800(broadcom Dell OEM).
  • vmnet attached to the vm(no visibility on vlan tags)



Linux server and Windows server, same vlan cause they are designated on the "servers vlan":
LINUX TO WINDOWS:
iperf3 -c 10.254.win.ip -P 8 -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.17 GBytes  1.00 Gbits/sec  128             sender
[  4]   0.00-10.00  sec  1.17 GBytes  1.00 Gbits/sec                  receiver
[  6]   0.00-10.00  sec   275 MBytes   231 Mbits/sec   69             sender
[  6]   0.00-10.00  sec   275 MBytes   231 Mbits/sec                  receiver
[  8]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  150             sender
[  8]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec  1.13 GBytes   972 Mbits/sec   98             sender
[ 10]   0.00-10.00  sec  1.13 GBytes   972 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec   264 MBytes   222 Mbits/sec   37             sender
[ 12]   0.00-10.00  sec   264 MBytes   222 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec  1.13 GBytes   973 Mbits/sec  109             sender
[ 14]   0.00-10.00  sec  1.13 GBytes   973 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec   280 MBytes   235 Mbits/sec   34             sender
[ 16]   0.00-10.00  sec   280 MBytes   235 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec   246 MBytes   206 Mbits/sec   64             sender
[ 18]   0.00-10.00  sec   246 MBytes   206 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  5.59 GBytes  4.81 Gbits/sec  689             sender
[SUM]   0.00-10.00  sec  5.59 GBytes  4.80 Gbits/sec                  receiver


LINUX TO WINDOWS(Reverse mode iperf): This is where iperf and vmxnet reaches it's full potential
iperf3 -c 10.254.win.ip -P 8 -R -w 128k 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  3.17 GBytes  2.72 Gbits/sec                  sender
[  4]   0.00-10.00  sec  3.17 GBytes  2.72 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  3.10 GBytes  2.66 Gbits/sec                  sender
[  6]   0.00-10.00  sec  3.10 GBytes  2.66 Gbits/sec                  receiver
[  8]   0.00-10.00  sec  2.91 GBytes  2.50 Gbits/sec                  sender
[  8]   0.00-10.00  sec  2.91 GBytes  2.50 Gbits/sec                  receiver
[ 10]   0.00-10.00  sec  3.00 GBytes  2.58 Gbits/sec                  sender
[ 10]   0.00-10.00  sec  3.00 GBytes  2.58 Gbits/sec                  receiver
[ 12]   0.00-10.00  sec  2.78 GBytes  2.39 Gbits/sec                  sender
[ 12]   0.00-10.00  sec  2.78 GBytes  2.39 Gbits/sec                  receiver
[ 14]   0.00-10.00  sec  2.85 GBytes  2.45 Gbits/sec                  sender
[ 14]   0.00-10.00  sec  2.85 GBytes  2.45 Gbits/sec                  receiver
[ 16]   0.00-10.00  sec  2.68 GBytes  2.31 Gbits/sec                  sender
[ 16]   0.00-10.00  sec  2.68 GBytes  2.31 Gbits/sec                  receiver
[ 18]   0.00-10.00  sec  2.63 GBytes  2.26 Gbits/sec                  sender
[ 18]   0.00-10.00  sec  2.63 GBytes  2.26 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  23.1 GBytes  19.9 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  23.1 GBytes  19.9 Gbits/sec                  receiver


Quote from: mimugmail on October 19, 2020, 07:38:33 PM
I have customers pushing 6Gbit over vmxnet driver.

OK. And what i'm supposed to do with this information? Not trying to be rude, but there is plenty of reports on this topic that goes against your scenario.

Do you have any idea what i could tune to achieve better performance then?

Quote from: nwildner on October 19, 2020, 08:20:36 PM
Quote from: mimugmail on October 19, 2020, 07:38:33 PM
I have customers pushing 6Gbit over vmxnet driver.

OK. And what i'm supposed to do with this information? Not trying to be rude, but there is plenty of reports on this topic that goes against your scenario.

Do you have any idea what i could tune to achieve better performance then?

What about this idea?
https://xenomorph.net/freebsd/performance-esxi/

,,The S in IoT stands for Security!" :)



Quote from: Supermule on October 19, 2020, 09:42:48 PM
Where do you manually edit the rc.conf??

There is an option inside the web administration:

Interface > Settings > Hardware LRO > Uncheck it to enable LRO

October 19, 2020, 10:14:40 PM #68 Last Edit: October 19, 2020, 10:46:10 PM by nwildner
Quote from: Gauss23 on October 19, 2020, 08:37:01 PM
What about this idea?
https://xenomorph.net/freebsd/performance-esxi/

Well, only enabling lro didn't change much. The guy that wrote this tutorial is using the same NIC series i'm using it was worth trying to enable lro,tso and vlan_hwfilter, and after that, things got a loooot better.

Still not catching 10Gbps, but could get almost 5Gbps which is pretty good:

Only enabling lro:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.17  sec   118 MBytes  97.5 Mbits/sec    0             sender
[  5]   0.00-10.17  sec   118 MBytes  97.5 Mbits/sec                  receiver
[  7]   0.00-10.17  sec   120 MBytes  98.9 Mbits/sec    0             sender
[  7]   0.00-10.17  sec   120 MBytes  98.9 Mbits/sec                  receiver
[  9]   0.00-10.17  sec   120 MBytes  98.8 Mbits/sec    0             sender
[  9]   0.00-10.17  sec   120 MBytes  98.8 Mbits/sec                  receiver
[ 11]   0.00-10.17  sec   117 MBytes  96.8 Mbits/sec    0             sender
[ 11]   0.00-10.17  sec   117 MBytes  96.8 Mbits/sec                  receiver
[ 13]   0.00-10.17  sec   118 MBytes  97.4 Mbits/sec    0             sender
[ 13]   0.00-10.17  sec   118 MBytes  97.4 Mbits/sec                  receiver
[ 15]   0.00-10.17  sec   119 MBytes  98.0 Mbits/sec    0             sender
[ 15]   0.00-10.17  sec   119 MBytes  98.0 Mbits/sec                  receiver
[ 17]   0.00-10.17  sec  90.8 MBytes  74.9 Mbits/sec    0             sender
[ 17]   0.00-10.17  sec  90.8 MBytes  74.9 Mbits/sec                  receiver
[ 19]   0.00-10.17  sec  72.2 MBytes  59.6 Mbits/sec    0             sender
[ 19]   0.00-10.17  sec  72.2 MBytes  59.6 Mbits/sec                  receiver
[SUM]   0.00-10.17  sec   875 MBytes   722 Mbits/sec    0             sender
[SUM]   0.00-10.17  sec   875 MBytes   722 Mbits/sec                  receiver

iperf Done.

vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
             options=800428<VLAN_MTU,JUMBO_MTU,LRO>
             ether 00:50:56:a5:d3:68
             inet6 fe80::250:56ff:fea5:d368%vmx0 prefixlen 64 scopeid 0x1
             media: Ethernet autoselect
             status: active   
             nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL


lro, tso and vlan_hwfilter enabled:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec  1.08 GBytes   929 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  1.08 GBytes   929 Mbits/sec                  receiver
[  7]   0.00-10.01  sec   510 MBytes   427 Mbits/sec    0             sender
[  7]   0.00-10.01  sec   510 MBytes   427 Mbits/sec                  receiver
[  9]   0.00-10.01  sec  1.05 GBytes   903 Mbits/sec    0             sender
[  9]   0.00-10.01  sec  1.05 GBytes   903 Mbits/sec                  receiver
[ 11]   0.00-10.01  sec   953 MBytes   799 Mbits/sec    0             sender
[ 11]   0.00-10.01  sec   953 MBytes   799 Mbits/sec                  receiver
[ 13]   0.00-10.01  sec   447 MBytes   375 Mbits/sec    0             sender
[ 13]   0.00-10.01  sec   447 MBytes   375 Mbits/sec                  receiver
[ 15]   0.00-10.01  sec   409 MBytes   342 Mbits/sec    0             sender
[ 15]   0.00-10.01  sec   409 MBytes   342 Mbits/sec                  receiver
[ 17]   0.00-10.01  sec   379 MBytes   318 Mbits/sec    0             sender
[ 17]   0.00-10.01  sec   379 MBytes   318 Mbits/sec                  receiver
[ 19]   0.00-10.01  sec   825 MBytes   691 Mbits/sec    0             sender
[ 19]   0.00-10.01  sec   825 MBytes   691 Mbits/sec                  receiver
[SUM]   0.00-10.01  sec  5.57 GBytes  4.78 Gbits/sec    0             sender
[SUM]   0.00-10.01  sec  5.57 GBytes  4.78 Gbits/sec                  receiver

iperf Done.
root@fw01adb:~ # ifconfig vmx0
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8507b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:50:56:a5:d3:68
        inet6 fe80::250:56ff:fea5:d368%vmx0 prefixlen 64 scopeid 0x1
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


But thats not 5 gbit/s....

I got better results disabling LRO on the ESXi host.


Quote from: nwildner on October 19, 2020, 08:20:36 PM
Quote from: mimugmail on October 19, 2020, 07:38:33 PM
I have customers pushing 6Gbit over vmxnet driver.

OK. And what i'm supposed to do with this information? Not trying to be rude, but there is plenty of reports on this topic that goes against your scenario.

Do you have any idea what i could tune to achieve better performance then?

You wrote vmxnet cant handle more than one gb which is not true. Now when someone googles for similar problem they might think it's a general limitation. I have no idea about hyperviaors, but I dont want that wrong facts are going wild

Quote from: mimugmail on October 20, 2020, 06:03:29 AM
You wrote vmxnet cant handle more than one gb which is not true. Now when someone googles for similar problem they might think it's a general limitation. I have no idea about hyperviaors, but I dont want that wrong facts are going wild

Just read again my reports.

vmxnet3 is not handling more than 1Gbps on FreeBSD(maybe, OPNSense specific patches). I never said vmxnet3 is garbage, and as you can see, Linux is handling traffic fine. I have other phisical machines on different offices and vmxnet3 is just fine with Linux and Windows.

And if you google for solutions, you will find plenty of information(and that also means missinformation). Bugs and other fixes(maybe iflib/vmx related) that COULD work:


UPDATE REPORT: had to disable lro, lso and vlan_hwfilter since it made traffic entering on that interface horribly slow (7Mbps max), and that is a regression that we could not handle.

Better have an interface using 1Gbps than one that uses 4,5Gbps only one way.

@nwilder: would you be so kind not to keep spreading inaccurate / false information around. We don't use any modifications on the vmx driver, which can do more than 1Gbps at ease on a stock FreeBSD 12.1. LRO shouldn't be used on a router for obvious reasons (also pointed at in my earlier post https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576).

October 20, 2020, 03:40:57 PM #73 Last Edit: October 20, 2020, 03:48:17 PM by nwildner
Quote from: AdSchellevis on October 20, 2020, 12:53:16 PM
@nwilder: would you be so kind not to keep spreading inaccurate / false information around. We don't use any modifications on the vmx driver, which can do more than 1Gbps at ease on a stock FreeBSD 12.1. LRO shouldn't be used on a router for obvious reasons (also pointed at in my earlier post https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576).

All right. I've tried lro/lso/vlan_hwfilter cause i'm running out of options here. Tried all those sysctls from that FreeBSD bugreport and no sensible performance increase was noticed after tunning tx/rx descriptors. Same 800Mbps limited on transfer whenever OPNSense tries to contact another host.

Other tests i've made:

1 - Iperf from one vlan interface to another, same parent interface(vmx0): After that, i've made another test by putting iperf to listen on one vlan interface(parent vmx0) while binding the client to another vlan interface(parent also vmx0) on this OPNSense box and got pretty good forwarding rates:

iperf3 -c 10.254.117.ip -B 10.254.110.ip -P 8 -w 128k 5201
[SUM]   0.00-10.00  sec  8.86 GBytes  7.61 Gbits/sec    0             sender
[SUM]   0.00-10.16  sec  8.86 GBytes  7.49 Gbits/sec                  receiver


I was just trying to test internal forwarding.

2 - Try do disable ipsec and it's passthrough related configs: By thinking that ipsec could be the one throttling the connection through it's passhtrough tunnels on traffic that comes in/ou of vlan interfaces, i've disabled all ipsec configs and iperf still got 800Mbps max from firewall to Windows/Linux servers.

3 - Disable PF: After disabling ipsec tunnels tried to disable pf entirely, did a fresh boot and put OPNSense in router mode. No luck (still the same iperf performance).

4 - Adding vlan 117 to a new phisical vmx interface, letting the hypervisor tag it: Presented a new interface, vlan 117 tagged by the hypervisor, changed the assignment inside OPNSense ONLY to this specific servers network. iperf tests keep getting the same speed.

Additional logs: bug id=237166 threw some light on this issue, and i've found that MSI-X vectors aren't being handled correctly by vmware(looking from the point of view that MSI-X related issues were resolved on FreeBSD). I'm looking for any documentation that could help me on this case. I'll try to thinker with hw.pci.honor_msi_blacklist=0 on loader.conf to see if i get better performance.

vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem 0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19 at device 0.0 on pci4
vmx0: Using 4096 TX descriptors and 2048 RX descriptors
vmx0: Using 4 RX queues 4 TX queues
vmx0: failed to allocate 5 MSI-X vectors, err: 6
vmx0: Using an MSI interrupt
vmx0: Ethernet address: 00:50:56:a5:d3:68
vmx0: netmap queues/slots: TX 1/4096, RX 1/4096


Edit: "hw.pci.honor_msi_blacklist: 0" removed the error form the log, but transfer rates remain the same:

vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem 0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19 at device 0.0 on pci4
vmx0: Using 4096 TX descriptors and 2048 RX descriptors
vmx0: Using 4 RX queues 4 TX queues
vmx0: Using MSI-X interrupts with 5 vectors
vmx0: Ethernet address: 00:50:56:a5:d3:68
vmx0: netmap queues/slots: TX 4/4096, RX 4/4096

root@fw01adb:~ # sysctl -a | grep blacklis
vm.page_blacklist:
hw.pci.honor_msi_blacklist: 0



Hope that some of my tests could bring light on this issue.

Removing the MSI blacklist option allocated 4 netmap TX/RX queues :)