OPNsense 4x slower than PFSense on same hardware

Started by cleverfoo, January 18, 2020, 08:55:59 PM

Previous topic - Next topic
Howdy folks, I'm running some tests on OPNSense 19.7.9_1-amd64  vs PFSense 2.4.4-RELEASE-p3 (amd64)  - both of them are running as virtual machines on the same host with no tuning but all patches applied. All I'm doing is installing iperf3 and running it in server mode for the tests, here are the results:

OPNsense:

% iperf3 -c 172.16.160.204
Connecting to host 172.16.160.204, port 5201
[  5] local 172.16.160.144 port 50482 connected to 172.16.160.204 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  6.99 MBytes  58.6 Mbits/sec                 
[  5]   1.00-2.00   sec  32.9 MBytes   276 Mbits/sec                 
[  5]   2.00-3.00   sec  33.0 MBytes   277 Mbits/sec                 
[  5]   3.00-4.00   sec  32.4 MBytes   272 Mbits/sec                 
[  5]   4.00-5.00   sec  31.9 MBytes   268 Mbits/sec                 
[  5]   5.00-6.00   sec  31.0 MBytes   260 Mbits/sec                 
[  5]   6.00-7.00   sec  31.1 MBytes   261 Mbits/sec                 
[  5]   7.00-8.00   sec  30.8 MBytes   259 Mbits/sec                 
[  5]   8.00-9.00   sec  31.2 MBytes   261 Mbits/sec                 
[  5]   9.00-10.00  sec  31.0 MBytes   260 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   292 MBytes   245 Mbits/sec                  sender
[  5]   0.00-10.00  sec   292 MBytes   245 Mbits/sec                  receiver

PFsense:
% iperf3 -c 172.16.160.190
Connecting to host 172.16.160.190, port 5201
[  5] local 172.16.160.144 port 49663 connected to 172.16.160.190 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  96.7 MBytes   811 Mbits/sec                 
[  5]   1.00-2.00   sec   112 MBytes   935 Mbits/sec                 
[  5]   2.00-3.00   sec   111 MBytes   935 Mbits/sec                 
[  5]   3.00-4.00   sec   112 MBytes   935 Mbits/sec                 
[  5]   4.00-5.00   sec   112 MBytes   936 Mbits/sec                 
[  5]   5.00-6.00   sec   112 MBytes   939 Mbits/sec                 
[  5]   6.00-7.00   sec   112 MBytes   938 Mbits/sec                 
[  5]   7.00-8.00   sec   112 MBytes   939 Mbits/sec                 
[  5]   8.00-9.00   sec   109 MBytes   914 Mbits/sec                 
[  5]   9.00-10.00  sec   112 MBytes   944 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  1.07 GBytes   923 Mbits/sec                  sender
[  5]   0.00-10.01  sec  1.07 GBytes   922 Mbits/sec                  receiver

The virtual machines are running under proxmoxve (linux/kvm) with same hardware settings (see screenhots). Needless to say I can get full gigabit performance through pfsense but about 4x lower using pfsense - is this expected?

Big thanks for a great product and great community



I'm not sure that using Prox firewall over firewall distro is OK at all...
Proxmox enthusiast @home, bare metal @work.

Im using unraid and not having an issue. As a test can you give the opnsense vm 2 cpus rather than one to check of its a cpu bottleneck? Otherwise might be not liking the nic drivers. Some people have tried e1000 to fox similar issues on pfsense so not sure if its a similar thing here.

It surely looks like the tests have the firewall as endpoint(s), which is rather irrelevant.

Did you try iperf between two endpoints on each side of the firewalls ?


Other than that, I'm also considering a config/driver issue that's not up to par on OPNsense yet there's not much info to troubleshoot.

Try to Install speedtest-cli as iperf on local is painfully slow

Here are my numbers. Both of these are fresh out of the box installs, OPNsense 19.7.9 and pfSense 2.4.4p3, both are X86_64.

Hypervisor Specs:
VMware ESXi 6.7u3
2x Intel Xeon E5620
All VMs are running open-vm-tools, including the firewalls

Specs on both firewall VMs are as follows:
2x CPU
4GB RAM
2x VMXnet3 NICs (one WAN, one LAN)

I have two other VMs running as iperf3 server and client. The "server" VM is on the WAN side of these firewalls, the client VM is on the "LAN" side. This is to test traffic throughput of the router itself. Never try to run these tests with the router/firewall acting as a client or server, you will not get accurate results.

pfSense 2.4.4p3:
Accepted connection from 192.168.1.230, port 56492
[  5] local 192.168.1.231 port 5201 connected to 192.168.1.230 port 45828
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   314 MBytes  2.64 Gbits/sec
[  5]   1.00-2.00   sec   459 MBytes  3.85 Gbits/sec
[  5]   2.00-3.00   sec   407 MBytes  3.41 Gbits/sec
[  5]   3.00-4.00   sec   393 MBytes  3.30 Gbits/sec
[  5]   4.00-5.00   sec   351 MBytes  2.94 Gbits/sec
[  5]   5.00-6.00   sec   372 MBytes  3.12 Gbits/sec
[  5]   6.00-7.00   sec   424 MBytes  3.55 Gbits/sec
[  5]   7.00-8.00   sec   410 MBytes  3.44 Gbits/sec
[  5]   8.00-9.00   sec   443 MBytes  3.71 Gbits/sec
[  5]   9.00-10.00  sec   393 MBytes  3.30 Gbits/sec
[  5]  10.00-11.00  sec   448 MBytes  3.76 Gbits/sec
[  5]  11.00-12.00  sec   428 MBytes  3.59 Gbits/sec
[  5]  12.00-13.00  sec   404 MBytes  3.39 Gbits/sec
[  5]  13.00-14.00  sec   419 MBytes  3.51 Gbits/sec
[  5]  14.00-15.00  sec   445 MBytes  3.73 Gbits/sec
[  5]  15.00-15.04  sec  16.1 MBytes  3.26 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-15.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-15.04  sec  5.98 GBytes  3.42 Gbits/sec                  receiver


OPNsense 19.7.9 (no tuning, Unbound using lots of CPU)
Accepted connection from 192.168.1.232, port 15150
[  5] local 192.168.1.231 port 5201 connected to 192.168.1.232 port 46858
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   304 MBytes  2.55 Gbits/sec
[  5]   1.00-2.00   sec  88.9 MBytes   746 Mbits/sec
[  5]   2.00-3.00   sec   371 MBytes  3.11 Gbits/sec
[  5]   3.00-4.00   sec   164 MBytes  1.38 Gbits/sec
[  5]   4.00-5.00   sec   420 MBytes  3.52 Gbits/sec
[  5]   5.00-6.00   sec  79.4 MBytes   666 Mbits/sec
[  5]   6.00-7.00   sec   400 MBytes  3.36 Gbits/sec
[  5]   7.00-8.00   sec  97.7 MBytes   820 Mbits/sec
[  5]   8.00-9.00   sec   403 MBytes  3.38 Gbits/sec
[  5]   9.00-10.00  sec   399 MBytes  3.35 Gbits/sec
[  5]  10.00-11.00  sec   104 MBytes   872 Mbits/sec
[  5]  11.00-12.00  sec   374 MBytes  3.14 Gbits/sec
[  5]  12.00-13.00  sec  74.0 MBytes   621 Mbits/sec
[  5]  13.00-14.00  sec   289 MBytes  2.42 Gbits/sec
[  5]  14.00-15.00  sec   135 MBytes  1.13 Gbits/sec
[  5]  15.00-15.04  sec  3.24 MBytes   675 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-15.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-15.04  sec  3.62 GBytes  2.07 Gbits/sec                  receiver


OPNsense 19.7.9 (set unbound to use Quad9 DoT using forwarding mode)
Accepted connection from 192.168.1.232, port 58840
[  5] local 192.168.1.231 port 5201 connected to 192.168.1.232 port 16760
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   214 MBytes  1.80 Gbits/sec
[  5]   1.00-2.00   sec   268 MBytes  2.25 Gbits/sec
[  5]   2.00-3.00   sec   312 MBytes  2.61 Gbits/sec
[  5]   3.00-4.00   sec   315 MBytes  2.64 Gbits/sec
[  5]   4.00-5.00   sec   273 MBytes  2.29 Gbits/sec
[  5]   5.00-6.00   sec   259 MBytes  2.17 Gbits/sec
[  5]   6.00-7.00   sec   201 MBytes  1.69 Gbits/sec
[  5]   7.00-8.00   sec   279 MBytes  2.34 Gbits/sec
[  5]   8.00-9.00   sec   311 MBytes  2.61 Gbits/sec
[  5]   9.00-10.00  sec   120 MBytes  1.01 Gbits/sec
[  5]  10.00-11.00  sec   237 MBytes  1.99 Gbits/sec
[  5]  11.00-12.00  sec   298 MBytes  2.50 Gbits/sec
[  5]  12.00-13.00  sec   322 MBytes  2.70 Gbits/sec
[  5]  13.00-14.00  sec   291 MBytes  2.44 Gbits/sec
[  5]  14.00-15.00  sec   303 MBytes  2.54 Gbits/sec
[  5]  15.00-15.03  sec  8.95 MBytes  2.26 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-15.03  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-15.03  sec  3.92 GBytes  2.24 Gbits/sec                  receiver


As we can see, OPNsense does seem to have some throughput limits out of the box. Still, I am seeing much higher throughput values than you are so it's important to make sure your tests are using servers/clients on the WAN and LAN sides of the firewall.

Finally, here's a screenshot of what a 'top -aSCHIP' looks like on the OPNsense 19.7.9 VM, you can see the high CPU usage for some reason with unbound. You may want to check if your OPNsense VM exhibits the same high CPU behavior, as that can also take away from the overall throughput.


A quick reply regarding the OPNsense CPU utilization. In my case this seemed to be related to DHCP6 being enabled out of the box. I'm not sure if OPNsense was trying to delegate a prefix to the LAN side over and over and causing high CPU usage on unbound? My logs are filled with this:

kernel: pflog0: promiscuous mode disabled
kernel: pflog0: promiscuous mode enabled


I was seeing these events spamming the logs constantly every second. As soon as I disabled DHCP6 on WAN, these errors went away and idle CPU usage on OPNsense returned to normal.

Here are the results of a current iperf3 test, using the same VMs described in my post above. These throughput numbers are much more consistent now that OPNsense has normal CPU usage.
Accepted connection from 192.168.1.232, port 4084
[  5] local 192.168.1.231 port 5201 connected to 192.168.1.232 port 24664
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   185 MBytes  1.55 Gbits/sec
[  5]   1.00-2.00   sec   323 MBytes  2.71 Gbits/sec
[  5]   2.00-3.00   sec   315 MBytes  2.64 Gbits/sec
[  5]   3.00-4.00   sec   344 MBytes  2.88 Gbits/sec
[  5]   4.00-5.00   sec   316 MBytes  2.65 Gbits/sec
[  5]   5.00-6.00   sec   357 MBytes  2.99 Gbits/sec
[  5]   6.00-7.00   sec   353 MBytes  2.96 Gbits/sec
[  5]   7.00-8.00   sec   349 MBytes  2.93 Gbits/sec
[  5]   8.00-9.00   sec   356 MBytes  2.98 Gbits/sec
[  5]   9.00-10.00  sec   345 MBytes  2.89 Gbits/sec
[  5]  10.00-11.00  sec   305 MBytes  2.56 Gbits/sec
[  5]  11.00-12.00  sec   348 MBytes  2.92 Gbits/sec
[  5]  12.00-13.00  sec   341 MBytes  2.86 Gbits/sec
[  5]  13.00-14.00  sec   343 MBytes  2.87 Gbits/sec
[  5]  14.00-15.00  sec   331 MBytes  2.77 Gbits/sec
[  5]  15.00-15.04  sec  14.8 MBytes  3.04 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-15.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-15.04  sec  4.81 GBytes  2.75 Gbits/sec                  receiver


Hey folks, thanks for all the replies and apologies for the slow response, I did not know that I had to ask to be notified as the creator of a thread. Gonna try to answers all the questions on the thread

1. unbound doesn't seem to be the issue, see attached screenshot for a run. As a matter of fact, the box seems to be snoozing with reasonably high idle time.

2. Run with 2 CPUs (1 socket 2 cores). No marked difference

% iperf3 -c 172.16.160.204
Connecting to host 172.16.160.204, port 5201
[  5] local 172.16.160.144 port 53463 connected to 172.16.160.204 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  31.0 MBytes   260 Mbits/sec                 
[  5]   1.00-2.00   sec  29.7 MBytes   249 Mbits/sec                 
[  5]   2.00-3.00   sec  27.6 MBytes   231 Mbits/sec                 
[  5]   3.00-4.00   sec  26.1 MBytes   219 Mbits/sec                 
[  5]   4.00-5.00   sec  25.9 MBytes   217 Mbits/sec                 
[  5]   5.00-6.00   sec  25.8 MBytes   216 Mbits/sec                 
[  5]   6.00-7.00   sec  24.8 MBytes   208 Mbits/sec                 
[  5]   7.00-8.00   sec  24.6 MBytes   206 Mbits/sec                 
[  5]   8.00-9.00   sec  25.9 MBytes   218 Mbits/sec                 
[  5]   9.00-10.00  sec  25.2 MBytes   211 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   267 MBytes   224 Mbits/sec                  sender
[  5]   0.00-10.00  sec   266 MBytes   223 Mbits/sec                  receiver

iperf Done.


3. Run with 1 CPU core and epro1000 NIC
% iperf3 -c 172.16.160.204
Connecting to host 172.16.160.204, port 5201
[  5] local 172.16.160.144 port 53902 connected to 172.16.160.204 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  5.47 MBytes  45.9 Mbits/sec                 
[  5]   1.00-2.00   sec  27.0 MBytes   227 Mbits/sec                 
[  5]   2.00-3.00   sec  22.5 MBytes   189 Mbits/sec                 
[  5]   3.00-4.00   sec  28.3 MBytes   237 Mbits/sec                 
[  5]   4.00-5.00   sec  28.0 MBytes   235 Mbits/sec                 
[  5]   5.00-6.00   sec  28.2 MBytes   236 Mbits/sec                 
[  5]   6.00-7.00   sec  27.9 MBytes   234 Mbits/sec                 
[  5]   7.00-8.00   sec  27.9 MBytes   234 Mbits/sec                 
[  5]   8.00-9.00   sec  28.5 MBytes   239 Mbits/sec                 
[  5]   9.00-10.00  sec  28.1 MBytes   236 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   252 MBytes   211 Mbits/sec                  sender
[  5]   0.00-10.00  sec   251 MBytes   210 Mbits/sec                  receiver

iperf Done.


4. I didn't install the speedtestcli since I'm running all the tests locally (on the same switch) not really trying to test out the IP circuit.

5. I disabled IPv6 on the WAN (I saw some of the same "promiscuous" log entries) and here are the results:

% iperf3 -c 172.16.160.204
Connecting to host 172.16.160.204, port 5201
[  5] local 172.16.160.144 port 53960 connected to 172.16.160.204 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  18.6 MBytes   156 Mbits/sec                 
[  5]   1.00-2.00   sec  28.6 MBytes   240 Mbits/sec                 
[  5]   2.00-3.00   sec  28.7 MBytes   240 Mbits/sec                 
[  5]   3.00-4.00   sec  28.7 MBytes   241 Mbits/sec                 
[  5]   4.00-5.00   sec  29.0 MBytes   243 Mbits/sec                 
[  5]   5.00-6.00   sec  28.8 MBytes   242 Mbits/sec                 
[  5]   6.00-7.00   sec  28.2 MBytes   237 Mbits/sec                 
[  5]   7.00-8.00   sec  29.1 MBytes   244 Mbits/sec                 
[  5]   8.00-9.00   sec  28.5 MBytes   239 Mbits/sec                 
[  5]   9.00-10.00  sec  27.7 MBytes   232 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   276 MBytes   231 Mbits/sec                  sender
[  5]   0.00-10.00  sec   275 MBytes   231 Mbits/sec                  receiver

iperf Done.


TLDR: things are still strangely slow, I really think this is a kernel or tuning issue but I'm unsure where to even being to look.

Use "VirtIO (paravirtualized)" network in Proxmox.
And use "host" as CPu Type.

There is one thing that is broken with "VirtIO (paravirtualized)". It's IPS/Suricata.


Thanks but the original numbers (at the very top of the thread) were using VirtIO without IPS enabled.

Just to confirm, does your setup look like the below diagram? OPNsense is not hosting the client or server portion of iperf, correct?

Nope it's not, Im not using the WAN interface at all, I'm just testing the speed of the LAN port hard wired to a Gigabit switch, like this:

[ OPNSesnse  running iperf3 in server mode ] LAN <=== [ Gitgabit Swith ] <=== [Macbook laptop with a 1Gb NIC]

I don't doubt that I'll be able to get more using both interfaces (WAN/LAN) flowing traffic that way but it will still not equate to full 1Gbps as a single interface is only able to route about 1/4 of that currently. Most importantly, the setup is the same for PfSense but its performance is much higher.