Poor throughput of OPNsense installed in a VMware VM using vmxnet3

Started by Layer8, September 18, 2023, 02:45:22 PM

Previous topic - Next topic
Hi all,

i see very poor throughput with OPNsense installations running in VMware VM (ESXi8) using vmxnet3 vNICs in fast networks. The following networks are based on a 10G network.

Matured OPNsense:
The reason for this thread is:  I cant get over 2,7Gbit/s with iperf3 between the opnsense and my windows client (both in same vlan with one switch in between) in a single stream connection.  Its not a iperf3 problem, because routed traffic is also not faster than 2,7Gbit/s. The 2,7Gbit limit seems to be a overall limitation.

Other VMs (Linux, Windows)
I can get easily over 9Gbit/s with Linux or Windows based VMs, so its definitly not a hardware, network or hypervisor related limitation.

Fresh plain FreeBSD 13.2
I installed a fresh plain FreeBSD 13.2 with iperf3 with same VM settings like the matured OPNsense  (8 vCPUs, 8GB ram, vmxnet3 adapter), same ESXi-host. The result is:

~5Gbit/s from client to FreeBSD, 6Gbit/s in reverse mode, both with only one iperf3 stream,
~8,5Gbit/s from client to FreeBSD, 7,3Gbit/s in reverse mode, both with two parallel iperf3 streams,
~9,2Gbit/s from client to FreeBSD, 8Gbit/s in reverse mode, both with three parallel iperf3 streams,

With three streams, utilization of the VM is at 42% from client to FreeBSD  and only 18% in reverse mode. Dont know if this unequal utilization is a iperf3 or FreeBSD issue.

Fresh plain OPNsense 23.7.4
After the results with plain FreeBSD 13.2, i installed a fresh plain OPNsense 23.7 (downloaded ISO file today) in a VM with exact same VM settings again. Updated it to  23.7.4 and installed iperf3 plugin (which is based on iperf v3.13). I applied a allow all floating rule to allow incomming connections to the iperf3 daemon.

When i started iperf3 the first time, I have seen this result:
~1Gbit/s from client to OPNsense with only one iperf3 stream

Because iperf3 plugin stops the iperf3 daemon once a test is canceled, I restarted the iperf3 daemon in the opnsense dashboard after every test. The after the first restart was:
~2.7Gbit/s from client to OPNsense with only one iperf3 stream

So, this is the first weird behavoiur in OPNsense. Why was the first test with 1G and the second with 2,7G? I was able to reproduce this after reverting the VM snapshot.

I continued the normal testing after this. Here are all results with firewall and one allow all floating rule enabled:

~2,8Gbit/s from client to OPNsense, ~3,1Gbit/s in reverse mode, both with only one iperf3 stream,
~5.2Gbit/s from client to OPNsense, ~3.9Gbit/s in reverse mode, both with two iperf3 streams,
~7.4Gbit/s from client to OPNsense, ~4.1Gbit/s in reverse mode, both with three iperf3 streams
~9Gbit/s from client to OPNsense, ~5Gbit/s in reverse mode, both with ten iperf3 streams

For comparison reasons, here is the CPU utilization for the test with three streams: 57% / 47%.


To make sure, that this is not an issue with the iperf3 plugin of OPNsense, i also uninstalled the plugin and installed iperf3 over cli using pkg install iperf3 (v3.14, a bit newer like FreeBSD 13.2). I then started iperf3 with: iperf3 -s. The result is:

~7.3Gbit/s from client to OPNsense, ~4,0Gbit/s in reverse mode, both with three iperf3 streams

Because the result is nearly the same like with the iperf3 plugin, I only tested it once with three streams.


I also disabled the firewall function over (Firewall -> Settings -> Advanced -> Disable Firewall. I removed the floating rule to check if FW is disabled. Here are the results:

~3,2Gbit/s from client to OPNsense, ~3,8Gbit/s in reverse mode, both with only one iperf3 stream,
~5.4Gbit/s from client to OPNsense, ~4,5Gbit/s in reverse mode, both with two iperf3 streams,
~7.4Gbit/s from client to OPNsense, ~4,3Gbit/s in reverse mode, both with three iperf3 streams
~9Gbit/s from client to OPNsense, ~5Gbit/s in reverse mode, both with ten iperf3 streams

For comparison reasons, here is the CPU utilization for this test with three streams: 50% / 47%.

With disabled firewall, throughput is a bit higher, but not much. Looks like the answer is not related to pf.


Matured OPNsense 23.7.4

To clarify that its totally different on a matured opnsense, I did a quick test again to show some results here:

~2,7Gbit/s from client to matured OPNsense, ~3,4Gbit/s in reverse mode, both with three iperf3 streams

CPU utilization: 35% / 30%.



Result

After this test series, it really looks like there is a  bottleneck in OPNsense when its installed in VMware with vmxnet3 adapters. I think its not a FreeBSD or driver issue, because throughput with plain FreeBSD is much better than with OPNsense. FreeBSD performance is also far away from perfection compared with Ubuntu and FreeBSD.

OPNsense is using a lot more compute power with less throughput than a plain FreeBSD.

I understand that there is possibly a lot of optimization in network components in a OPNsense, which causes a bigger overhead. But I think a core feature of a good firewall is efficiency and scaleability if the hardware is good enough.

Whats the reason for this issue?

Have you tried toggling the hardware acceleration features? Specifically Interfaces → Settings → Network Interfaces.

No. That's something I hadn't thought of before, because everywhere you can read that you should leave hardware acceleration completely deactivated.

After activating the acceleration like shown in you attachement, I see the following result on the matured OPNsense:

~5,3Gbit/s from client to OPNsense, ~5,8Gbit/s in reverse mode, both with only one iperf3 stream,
~7,4Gbit/s from client to OPNsense, ~7Gbit/s in reverse mode, both with two iperf3 streams,
~9Gbit/s from client to OPNsense, ~7,8Gbit/s in reverse mode, both with three iperf3 streams

CPU utliziation with three streams is now: 26% / 25% reverse

This doubled the speed in some scenarios for iperf3 benchmark, which means, OPNsense is on the same level like a plain FreeBSD 13.2 installation.

I will do some test with routed traffic in the next couple of days.



Thanks a lot for this hint!

This leads to the question, why everyone is saying that one should disable hardware acceleration. Whats the reason for this widely spread recommendation and is there any disadvantage now for a VMware scenario?


@schmuessla: Thanks, but what information is helpful in this documentation please? If you mean, thats the answer to the question why everyone is saying that one should deactivate hardware acceleration, then yes, this could be the answer.

@all: It was possible to do a iperf3 from my windows client to the OPNsense VM with a vmxnet3 adapter with over 9Gbit/s in both directions:


Send:
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-255.90 sec   [b]277 GBytes  9.31 Gbits/sec[/b]                  sender
[  4]   0.00-255.90 sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

Reverse:
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-86.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  4]   0.00-86.04  sec  [b]92.9 GBytes  9.28 Gbits/sec  [/b]              receiver
iperf3: interrupt - the client has terminated



And I dont know exactly why.

What I have done since yesterday:
- I have added a vSphere 8 Enterprise Plus licence key (it was a free ESXi licence before)
- Through adding the licence key, it was possible to active SR-IOV. I activated it on the Intel x550-T2 adapter
- I added a SR-IOV device to the VM
- I assigned the new SR-IOV-based ixv0 adapter to an interface in opnsense, activated it and locked it agains removal
- I added a IP to the ixv0 interface and tested throughput with iperf, but it was not faster then yesterday (4-5Gbits/s)
- I then tried to add the ixv0 nic to an older, existing interface, but this was not possible, because of some "Interface ue0 does not exist" warnings (one interface is attached to Android USB-Thetering, others are deactivated Wireguards tunnels).
- I then switched on the Wireguard tunnels, to solve the "Interface does not exist" problem to be able to assign interfaces again
- I also deleted the interface which was still attached to ixv0. ixv0 is avaible for new assignement at the moment.

During all this steps, it was not possible to iperf with nearly 10G. Most of the time it was under 5G, one time only 1G.

But now, I can start iperf3 benchmarks with 9-10Gbits/s.


Can someone explain this?


Edit: I do iperf allways to 10.1.1.1/24, which was assigned to vmx0 all the time.







Did you sort it out?

I'm more or less in the same spot, except I'm testing with public servers.

My connection is pretty stable and it's only 1Gbit/s. It should be easy to get max it out, but OPNsense is the only firewall that is way below the others.

I even did all the tuning and all that but it improved nothing. :/



For each firewall, I cloned the first VM without disks, then installed a fresh image on it so things are exactly the same.

[In the screenshot] OPNsense was tested tuned (allegedly), without extras and a very basic ruleset. The other test is a loaded Mikrotik CHR, under normal load (at least 15-20K connections just from BitTorrent traffic). Results were the firsts ones I got, not the best. pfSense and OpenWRT were deployed just for the test thus they were in similar conditions than OPNsense; they performed consistently similar to CHR though, actually a bit better, I assume since they had no load.

It's not a scientific as your testing but, y'know, it's real world—to complement yours. Hopefully it gets some expert's attention. :)


I'm a bit dyslexic and it makes me forgo letters at the end of words. What gets written is written correctly though, I have good orthography in one or two languages, ironically. It's messed up, I know, I'm sorry. Just pretend you're my auto-complete. :)