Howdy folks, I'm running some tests on OPNSense 19.7.9_1-amd64 vs PFSense 2.4.4-RELEASE-p3 (amd64) - both of them are running as virtual machines on the same host with no tuning but all patches applied. All I'm doing is installing iperf3 and running it in server mode for the tests, here are the results:
OPNsense:
% iperf3 -c 172.16.160.204
Connecting to host 172.16.160.204, port 5201
[ 5] local 172.16.160.144 port 50482 connected to 172.16.160.204 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 6.99 MBytes 58.6 Mbits/sec
[ 5] 1.00-2.00 sec 32.9 MBytes 276 Mbits/sec
[ 5] 2.00-3.00 sec 33.0 MBytes 277 Mbits/sec
[ 5] 3.00-4.00 sec 32.4 MBytes 272 Mbits/sec
[ 5] 4.00-5.00 sec 31.9 MBytes 268 Mbits/sec
[ 5] 5.00-6.00 sec 31.0 MBytes 260 Mbits/sec
[ 5] 6.00-7.00 sec 31.1 MBytes 261 Mbits/sec
[ 5] 7.00-8.00 sec 30.8 MBytes 259 Mbits/sec
[ 5] 8.00-9.00 sec 31.2 MBytes 261 Mbits/sec
[ 5] 9.00-10.00 sec 31.0 MBytes 260 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 292 MBytes 245 Mbits/sec sender
[ 5] 0.00-10.00 sec 292 MBytes 245 Mbits/sec receiver
PFsense:
% iperf3 -c 172.16.160.190
Connecting to host 172.16.160.190, port 5201
[ 5] local 172.16.160.144 port 49663 connected to 172.16.160.190 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 96.7 MBytes 811 Mbits/sec
[ 5] 1.00-2.00 sec 112 MBytes 935 Mbits/sec
[ 5] 2.00-3.00 sec 111 MBytes 935 Mbits/sec
[ 5] 3.00-4.00 sec 112 MBytes 935 Mbits/sec
[ 5] 4.00-5.00 sec 112 MBytes 936 Mbits/sec
[ 5] 5.00-6.00 sec 112 MBytes 939 Mbits/sec
[ 5] 6.00-7.00 sec 112 MBytes 938 Mbits/sec
[ 5] 7.00-8.00 sec 112 MBytes 939 Mbits/sec
[ 5] 8.00-9.00 sec 109 MBytes 914 Mbits/sec
[ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec sender
[ 5] 0.00-10.01 sec 1.07 GBytes 922 Mbits/sec receiver
The virtual machines are running under proxmoxve (linux/kvm) with same hardware settings (see screenhots). Needless to say I can get full gigabit performance through pfsense but about 4x lower using pfsense - is this expected?
Big thanks for a great product and great community
VM settings
and the second one
I'm not sure that using Prox firewall over firewall distro is OK at all...
Im using unraid and not having an issue. As a test can you give the opnsense vm 2 cpus rather than one to check of its a cpu bottleneck? Otherwise might be not liking the nic drivers. Some people have tried e1000 to fox similar issues on pfsense so not sure if its a similar thing here.
It surely looks like the tests have the firewall as endpoint(s), which is rather irrelevant.
Did you try iperf between two endpoints on each side of the firewalls ?
Other than that, I'm also considering a config/driver issue that's not up to par on OPNsense yet there's not much info to troubleshoot.
Try to Install speedtest-cli as iperf on local is painfully slow
Here are my numbers. Both of these are fresh out of the box installs, OPNsense 19.7.9 and pfSense 2.4.4p3, both are X86_64.
Hypervisor Specs:
VMware ESXi 6.7u3
2x Intel Xeon E5620
All VMs are running open-vm-tools, including the firewalls
Specs on both firewall VMs are as follows:
2x CPU
4GB RAM
2x VMXnet3 NICs (one WAN, one LAN)
I have two other VMs running as iperf3 server and client. The "server" VM is on the WAN side of these firewalls, the client VM is on the "LAN" side. This is to test traffic throughput of the router itself. Never try to run these tests with the router/firewall acting as a client or server, you will not get accurate results.
pfSense 2.4.4p3:
Accepted connection from 192.168.1.230, port 56492
[ 5] local 192.168.1.231 port 5201 connected to 192.168.1.230 port 45828
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-1.00 sec 314 MBytes 2.64 Gbits/sec
[ 5] 1.00-2.00 sec 459 MBytes 3.85 Gbits/sec
[ 5] 2.00-3.00 sec 407 MBytes 3.41 Gbits/sec
[ 5] 3.00-4.00 sec 393 MBytes 3.30 Gbits/sec
[ 5] 4.00-5.00 sec 351 MBytes 2.94 Gbits/sec
[ 5] 5.00-6.00 sec 372 MBytes 3.12 Gbits/sec
[ 5] 6.00-7.00 sec 424 MBytes 3.55 Gbits/sec
[ 5] 7.00-8.00 sec 410 MBytes 3.44 Gbits/sec
[ 5] 8.00-9.00 sec 443 MBytes 3.71 Gbits/sec
[ 5] 9.00-10.00 sec 393 MBytes 3.30 Gbits/sec
[ 5] 10.00-11.00 sec 448 MBytes 3.76 Gbits/sec
[ 5] 11.00-12.00 sec 428 MBytes 3.59 Gbits/sec
[ 5] 12.00-13.00 sec 404 MBytes 3.39 Gbits/sec
[ 5] 13.00-14.00 sec 419 MBytes 3.51 Gbits/sec
[ 5] 14.00-15.00 sec 445 MBytes 3.73 Gbits/sec
[ 5] 15.00-15.04 sec 16.1 MBytes 3.26 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-15.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-15.04 sec 5.98 GBytes 3.42 Gbits/sec receiver
OPNsense 19.7.9 (no tuning, Unbound using lots of CPU)
Accepted connection from 192.168.1.232, port 15150
[ 5] local 192.168.1.231 port 5201 connected to 192.168.1.232 port 46858
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-1.00 sec 304 MBytes 2.55 Gbits/sec
[ 5] 1.00-2.00 sec 88.9 MBytes 746 Mbits/sec
[ 5] 2.00-3.00 sec 371 MBytes 3.11 Gbits/sec
[ 5] 3.00-4.00 sec 164 MBytes 1.38 Gbits/sec
[ 5] 4.00-5.00 sec 420 MBytes 3.52 Gbits/sec
[ 5] 5.00-6.00 sec 79.4 MBytes 666 Mbits/sec
[ 5] 6.00-7.00 sec 400 MBytes 3.36 Gbits/sec
[ 5] 7.00-8.00 sec 97.7 MBytes 820 Mbits/sec
[ 5] 8.00-9.00 sec 403 MBytes 3.38 Gbits/sec
[ 5] 9.00-10.00 sec 399 MBytes 3.35 Gbits/sec
[ 5] 10.00-11.00 sec 104 MBytes 872 Mbits/sec
[ 5] 11.00-12.00 sec 374 MBytes 3.14 Gbits/sec
[ 5] 12.00-13.00 sec 74.0 MBytes 621 Mbits/sec
[ 5] 13.00-14.00 sec 289 MBytes 2.42 Gbits/sec
[ 5] 14.00-15.00 sec 135 MBytes 1.13 Gbits/sec
[ 5] 15.00-15.04 sec 3.24 MBytes 675 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-15.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-15.04 sec 3.62 GBytes 2.07 Gbits/sec receiver
OPNsense 19.7.9 (set unbound to use Quad9 DoT using forwarding mode)
Accepted connection from 192.168.1.232, port 58840
[ 5] local 192.168.1.231 port 5201 connected to 192.168.1.232 port 16760
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-1.00 sec 214 MBytes 1.80 Gbits/sec
[ 5] 1.00-2.00 sec 268 MBytes 2.25 Gbits/sec
[ 5] 2.00-3.00 sec 312 MBytes 2.61 Gbits/sec
[ 5] 3.00-4.00 sec 315 MBytes 2.64 Gbits/sec
[ 5] 4.00-5.00 sec 273 MBytes 2.29 Gbits/sec
[ 5] 5.00-6.00 sec 259 MBytes 2.17 Gbits/sec
[ 5] 6.00-7.00 sec 201 MBytes 1.69 Gbits/sec
[ 5] 7.00-8.00 sec 279 MBytes 2.34 Gbits/sec
[ 5] 8.00-9.00 sec 311 MBytes 2.61 Gbits/sec
[ 5] 9.00-10.00 sec 120 MBytes 1.01 Gbits/sec
[ 5] 10.00-11.00 sec 237 MBytes 1.99 Gbits/sec
[ 5] 11.00-12.00 sec 298 MBytes 2.50 Gbits/sec
[ 5] 12.00-13.00 sec 322 MBytes 2.70 Gbits/sec
[ 5] 13.00-14.00 sec 291 MBytes 2.44 Gbits/sec
[ 5] 14.00-15.00 sec 303 MBytes 2.54 Gbits/sec
[ 5] 15.00-15.03 sec 8.95 MBytes 2.26 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-15.03 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-15.03 sec 3.92 GBytes 2.24 Gbits/sec receiver
As we can see, OPNsense does seem to have some throughput limits out of the box. Still, I am seeing much higher throughput values than you are so it's important to make sure your tests are using servers/clients on the WAN and LAN sides of the firewall.
Finally, here's a screenshot of what a 'top -aSCHIP' looks like on the OPNsense 19.7.9 VM, you can see the high CPU usage for some reason with unbound. You may want to check if your OPNsense VM exhibits the same high CPU behavior, as that can also take away from the overall throughput.
A quick reply regarding the OPNsense CPU utilization. In my case this seemed to be related to DHCP6 being enabled out of the box. I'm not sure if OPNsense was trying to delegate a prefix to the LAN side over and over and causing high CPU usage on unbound? My logs are filled with this:
kernel: pflog0: promiscuous mode disabled
kernel: pflog0: promiscuous mode enabled
I was seeing these events spamming the logs constantly every second. As soon as I disabled DHCP6 on WAN, these errors went away and idle CPU usage on OPNsense returned to normal.
Here are the results of a current iperf3 test, using the same VMs described in my post above. These throughput numbers are much more consistent now that OPNsense has normal CPU usage.
Accepted connection from 192.168.1.232, port 4084
[ 5] local 192.168.1.231 port 5201 connected to 192.168.1.232 port 24664
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-1.00 sec 185 MBytes 1.55 Gbits/sec
[ 5] 1.00-2.00 sec 323 MBytes 2.71 Gbits/sec
[ 5] 2.00-3.00 sec 315 MBytes 2.64 Gbits/sec
[ 5] 3.00-4.00 sec 344 MBytes 2.88 Gbits/sec
[ 5] 4.00-5.00 sec 316 MBytes 2.65 Gbits/sec
[ 5] 5.00-6.00 sec 357 MBytes 2.99 Gbits/sec
[ 5] 6.00-7.00 sec 353 MBytes 2.96 Gbits/sec
[ 5] 7.00-8.00 sec 349 MBytes 2.93 Gbits/sec
[ 5] 8.00-9.00 sec 356 MBytes 2.98 Gbits/sec
[ 5] 9.00-10.00 sec 345 MBytes 2.89 Gbits/sec
[ 5] 10.00-11.00 sec 305 MBytes 2.56 Gbits/sec
[ 5] 11.00-12.00 sec 348 MBytes 2.92 Gbits/sec
[ 5] 12.00-13.00 sec 341 MBytes 2.86 Gbits/sec
[ 5] 13.00-14.00 sec 343 MBytes 2.87 Gbits/sec
[ 5] 14.00-15.00 sec 331 MBytes 2.77 Gbits/sec
[ 5] 15.00-15.04 sec 14.8 MBytes 3.04 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-15.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-15.04 sec 4.81 GBytes 2.75 Gbits/sec receiver
Interesting. Good find man :)
Hey folks, thanks for all the replies and apologies for the slow response, I did not know that I had to ask to be notified as the creator of a thread. Gonna try to answers all the questions on the thread
1. unbound doesn't seem to be the issue, see attached screenshot for a run. As a matter of fact, the box seems to be snoozing with reasonably high idle time.
2. Run with 2 CPUs (1 socket 2 cores). No marked difference
% iperf3 -c 172.16.160.204
Connecting to host 172.16.160.204, port 5201
[ 5] local 172.16.160.144 port 53463 connected to 172.16.160.204 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 31.0 MBytes 260 Mbits/sec
[ 5] 1.00-2.00 sec 29.7 MBytes 249 Mbits/sec
[ 5] 2.00-3.00 sec 27.6 MBytes 231 Mbits/sec
[ 5] 3.00-4.00 sec 26.1 MBytes 219 Mbits/sec
[ 5] 4.00-5.00 sec 25.9 MBytes 217 Mbits/sec
[ 5] 5.00-6.00 sec 25.8 MBytes 216 Mbits/sec
[ 5] 6.00-7.00 sec 24.8 MBytes 208 Mbits/sec
[ 5] 7.00-8.00 sec 24.6 MBytes 206 Mbits/sec
[ 5] 8.00-9.00 sec 25.9 MBytes 218 Mbits/sec
[ 5] 9.00-10.00 sec 25.2 MBytes 211 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 267 MBytes 224 Mbits/sec sender
[ 5] 0.00-10.00 sec 266 MBytes 223 Mbits/sec receiver
iperf Done.
3. Run with 1 CPU core and epro1000 NIC
% iperf3 -c 172.16.160.204
Connecting to host 172.16.160.204, port 5201
[ 5] local 172.16.160.144 port 53902 connected to 172.16.160.204 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 5.47 MBytes 45.9 Mbits/sec
[ 5] 1.00-2.00 sec 27.0 MBytes 227 Mbits/sec
[ 5] 2.00-3.00 sec 22.5 MBytes 189 Mbits/sec
[ 5] 3.00-4.00 sec 28.3 MBytes 237 Mbits/sec
[ 5] 4.00-5.00 sec 28.0 MBytes 235 Mbits/sec
[ 5] 5.00-6.00 sec 28.2 MBytes 236 Mbits/sec
[ 5] 6.00-7.00 sec 27.9 MBytes 234 Mbits/sec
[ 5] 7.00-8.00 sec 27.9 MBytes 234 Mbits/sec
[ 5] 8.00-9.00 sec 28.5 MBytes 239 Mbits/sec
[ 5] 9.00-10.00 sec 28.1 MBytes 236 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 252 MBytes 211 Mbits/sec sender
[ 5] 0.00-10.00 sec 251 MBytes 210 Mbits/sec receiver
iperf Done.
4. I didn't install the speedtestcli since I'm running all the tests locally (on the same switch) not really trying to test out the IP circuit.
5. I disabled IPv6 on the WAN (I saw some of the same "promiscuous" log entries) and here are the results:
% iperf3 -c 172.16.160.204
Connecting to host 172.16.160.204, port 5201
[ 5] local 172.16.160.144 port 53960 connected to 172.16.160.204 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 18.6 MBytes 156 Mbits/sec
[ 5] 1.00-2.00 sec 28.6 MBytes 240 Mbits/sec
[ 5] 2.00-3.00 sec 28.7 MBytes 240 Mbits/sec
[ 5] 3.00-4.00 sec 28.7 MBytes 241 Mbits/sec
[ 5] 4.00-5.00 sec 29.0 MBytes 243 Mbits/sec
[ 5] 5.00-6.00 sec 28.8 MBytes 242 Mbits/sec
[ 5] 6.00-7.00 sec 28.2 MBytes 237 Mbits/sec
[ 5] 7.00-8.00 sec 29.1 MBytes 244 Mbits/sec
[ 5] 8.00-9.00 sec 28.5 MBytes 239 Mbits/sec
[ 5] 9.00-10.00 sec 27.7 MBytes 232 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 276 MBytes 231 Mbits/sec sender
[ 5] 0.00-10.00 sec 275 MBytes 231 Mbits/sec receiver
iperf Done.
TLDR: things are still strangely slow, I really think this is a kernel or tuning issue but I'm unsure where to even being to look.
Use "VirtIO (paravirtualized)" network in Proxmox.
And use "host" as CPu Type.
There is one thing that is broken with "VirtIO (paravirtualized)". It's IPS/Suricata.
Thanks but the original numbers (at the very top of the thread) were using VirtIO without IPS enabled.
Just to confirm, does your setup look like the below diagram? OPNsense is not hosting the client or server portion of iperf, correct?
Nope it's not, Im not using the WAN interface at all, I'm just testing the speed of the LAN port hard wired to a Gigabit switch, like this:
[ OPNSesnse running iperf3 in server mode ] LAN <=== [ Gitgabit Swith ] <=== [Macbook laptop with a 1Gb NIC]
I don't doubt that I'll be able to get more using both interfaces (WAN/LAN) flowing traffic that way but it will still not equate to full 1Gbps as a single interface is only able to route about 1/4 of that currently. Most importantly, the setup is the same for PfSense but its performance is much higher.
Yes, I see what you mean with pfSense being able to sustain a higher throughput under the same circumstance.
Unfortunately, I'm not sure if this is useful data because it isn't telling us how fast either solution can actually route packets. In actual use, we'd be using pfSense or OPNsense as a firewall/router setup and we would want to see how quickly they can push traffic through themselves, rather than serving traffic directly through one interface.
I suppose the last things I would check would be to verify that OPNsense and pfSense both have OpenVMTools installed/running. And run a 'top -aSCHIP' on an SSH console for both of them and see what their CPU usage is when running your transfer test. Watching them under load may reveal a bottleneck, especially on the OPNsense router since that one seems to be under performing in your tests.
Finally, if you feel like digging in a bit more, I would recommend doing a test using Proxmox client and server VMs, and have two switches inside Proxmox. One can be used for the WAN port and one switch is a private switch with no physical uplinks, we'll use this for the LAN port. Doing this method will allow you to simulate actual routing performance of both solutions and you won't need an extra physical client.
One of my clients pushes 2,6Gbit with ESX6.7 and vmxnet3. With Chelsio 40G cards I was able to route 22Gbit