Poor throughput under virtualization compared to stock FreeBSD

Started by emzy, December 08, 2024, 05:13:50 PM

Previous topic - Next topic
There are many similar posts on this forum, Reddit, and others complaining about poor performance running OPNsense as a VM. I've read through many of them which describe the same symptoms, but haven't seen any which actually root-caused the issue. I'm hoping the developers or someone more familiar with OPNsense than me can offer some suggestions for what the culprit may be.

Using iperf3, I can easily saturate a 2.5 Gbps line between the client machine and both Linux and FreeBSD VMs running on my VM host when both are on the same VLAN. If I run the iperf3 server on the VMs and the client on the VM host, I get around 18 Gbps of throughput. I checked top on the FreeBSD VM during the test and see around 40% CPU usage on interrupts. For FreeBSD, I ran the test with network hardware offloading disabled.

CPU:  0.4% user,  0.0% nice,  9.8% system, 37.1% interrupt, 52.7% idle

For OPNsense however, I can only achieve around 1.3 Gbps of throughput, regardless of whether the OPNsense box is the iperf3 server or simply routing traffic between vlans for the iperf3 client and server on different machines. Even while running iperf3 on the OPNsense VM and running the client on the VM host I get the same 1.3 Gbps of throughput. With stock FreeBSD and Linux I get around 18 Gbps in that scenario.

Under these loads OPNsense shows relatively high cpu usage for interrupts, around 84% vs 35-40% for stock FreeBSD. Even still, that core is still 15% idle, so I'm not sure if I'm hitting a cpu limit.

CPU 0:  0.8% user,  0.0% nice,  1.2% system,  0.4% interrupt, 97.7% idle
CPU 1:  0.0% user,  0.0% nice,  0.8% system,  0.4% interrupt, 98.8% idle
CPU 2:  0.0% user,  0.0% nice,  0.4% system, 84.0% interrupt, 15.6% idle
CPU 3:  4.3% user,  0.0% nice,  0.4% system,  0.8% interrupt, 94.5% idle


All of the test VMs are using the same hypervisor configuration with virtio nics, but OPNsense has substantially lower throughput. I'm not running any sort of IDS or IPS on OPNsense. I've tried applying the various tuneables that are often mentioned in these threads, but nothing has helped in any substantial way.

I'm not sure this can be fixed, but I'd love to understand why it's happening. OPNsense is burning substantially more CPU than stock FreeBSD during the iperf3 test, but it's not burning 10x the CPU even though it's getting at least 10x less throughput in the VM host to guest test.

Does anyone have any insight into what's happening here? Is OPNsense just doing heavyweight processing for every packet and topping out at 1.3 Gbps?

Don't have an answer.... but am very interested in the solution to this...

- What host platform (like VMware, Proxmox, etc...) and what hardware are you using?
- Is the limitation to do with the network drivers on the specific platform?
- What NICs are in the hardware?

I'm running all of the VMs on a Synology NAS using their somewhat basic VMM platform which I'm 99% sure runs KVM and QEMU. I think the comparisons between OPNsense and the Linux and stock FreeBSD machines are important since the results indicate that there's something unique happening with OPNsense (it runs on FreeBSD after all).

I don't think the issue has anything to do with the hardware NICs because I'm using virtio and not passing them through to directly OPNsense. The fact that I can't exceed ~1.3 Gbps between the VM host and the OPNsense VM when the traffic only passes through the vswitch is also suspect since that traffic shouldn't touch the physical NIC.

I am planning to test OPNsense on another VM host when I have time, possibly next weekend, to gather more data. My best guess now is that OPNsense is doing more heavyweight packet processing, but that doesn't make complete sense because it only uses 2x the CPU of FreeBSD but has ~10x less throughput in the host to guest iperf3 test.

There can be many answers to this. What first comes to mind is:

1. Use iperf -P8 to measure real throughput.
2. Disable add-ons like crowdsec, suricata and zenarmor.
3. Try vtnet, not pass-through NICs.
4. Enable RSS.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

> 1. Use iperf -P8 to measure real throughput.
I'll try this out, but even if it performs better there's still the question of why a single stream performs so much worse than stock FreeBSD or Linux.

> 2. Disable add-ons like crowdsec, suricata and zenarmor.
I'm not running any of these.

> 3. Try vtnet, not pass-through NICs.
I'm using vtnet (i.e. virtio), not passthrough.

> 4. Enable RSS.
I've tried this previously and didn't notice any difference. I'll give it another try in my next round of testing though.