1
Hardware and Performance / Re: Poor Throughput (Even On Same Network Segment)
« on: October 22, 2020, 07:46:38 pm »You guys got me interested in this subject. I have tested plenty of iperf3 against my VMs in my little 3-host homelab, my 10GbE is just a couple DACs connected between the 10Gbe "backbone" IFs of my Dell Powerconnect 7048P, which is really more of a gigabit switch.
The infrastructure i have on that remote office i was reporting so far:
- PowerEdge R630(2 servers)
- 2 Socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz with 12 cores each(24 cores per server)
- 3x NetXtreme II BCM57800 10 Gigabit Ethernet (dual port NIC), meaning 6 phisical adapters distributed into 3 virtual switches (2 nics vm, 2 nics vmotion, 2 nics vmkernel)
- 512GB Ram each server
- Plenty of storage on an external SAS 12Gbps(2x6Gbps active + 2x6GBbps passive paths) MD3xxx Dell storage with round-robin paths
- 2x Dell N4032F as core/backbone switches with 10Gbps ports and stacked with 2x 40Gbps ports.
- 6 port trunks for each server. 3 ports per trunk per stacking member so, each vSwitch nic will touch one stack member
- Stack member on Dell N series are treated as a unity so, LACP can be configured across stack members(no MLAG involved).
Even when trying to transfer data between vms that were not registered on the same phisical hardware i can achieve 8Gbps easily, except with vmxnet3 driver from FreeBSD 12.1.
Be honest to yourself, would you buy a piece of hardware with only 2 cores if you have to requirement for 10G? The smallest hardware with 10 interfaces has 4 core minimum.
What is not honest is to pretend that a VM cant push more than 1Gbps or achieve decent throughput rates while having only 1 vCPU configured, and that is not true. On the contrary, while doing virtualization you should always configure resources in a way that will avoid cpu oversubscription. Having for example a 4vCPUs VM that is mostly idle and does not run cpu intense operations will create problems to other vms on the same pool/share/physical hardware. For simple iperf3 and network transfer tests with FreeBSD 13 1vCPU did fine, while OPNSense(FreeBSD 12.1) with 4vCPU and high cpu shares being the only VM with that share configuration crawled during transfers.
Vmxnet3 on FreeBSD 12.1 is garbage. It seems that the port to iflib created some regressions related to MSI-X, tx/rx queues, iflib leaking MSI-x messages, non-power-of 2 tx/rx queue configs and others. I could even find some LRO regressions on commits that could explain retransmissions and the abismal lack of performance that i've reported here on a previous page while trying to enable LRO as a workaround for that performance issue. https://svnweb.freebsd.org/base/head/sys/dev/vmware/vmxnet3/if_vmx.c?view=log
The test i've made above with FreeBSD 13-CURRENT, i was only using 1vCPU, 4GB ram, pvscsi and vmxnet3 and the system performed greatly compared with the vmxnet3 driver state of the FreeBSD 12.1-RELEASE.