Print Page - Poor Throughput (Even On Same Network Segment)

Title: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 25, 2020, 08:31:25 PM

I originally posted on Reddit but figured I might get more traction here with this.

I have an OPNsense 20.7.1 server running on a Dell R430 with 16 GB DDR4 RAM, an Intel Xeon E5-2620 v3 (6 cores/12 threads @ 2.40GHz) CPU and an Intel X520-SR2 10GbE NIC.

My network has several VLANs and network subnets with my OPNsense router functioning as a router on a stick doing all the traffic firewalling and routing between each network segment.

I recently upgraded my OPNsense to 20.7.1 and on a whim decided to run an iperf3 test between two VMs on different network segments to see what kind of throughput I was getting. I am certain, at least at some point, this very same hardware pushed over 6 Gbps on the same iperf3 test. Today it was getting around 850 Mbps every single time.

I started iperf3 as a server on my QNAP NAS device which is also attached to the same 10 Gbps switch and ran iperf3 as a client from OPNsense on the same network segment and got the same 850 Mbps throughput.

To make sure I wasn't limited by the QNAP NAS device, I ran the same iperf3 test with my other QNAP NAS device as a client to the first QNAP NAS device and it pushed 8.6 Gbps across the same network segment (no OPNsense involved) so both the QNAP and the switch can push it.

My question is what do I have going wrong here? Even the same network segment, OPNsense can't do more than 850 Mbps throughput. I have no idea if this was happening pre-upgrade to 20.7.1 but I know for sure it is happening now. I would assume an iperf3 test from the OPNsense server on the same network segment would surely remove any doubt it was firewalling, etc.

The interface shows 10 Gbps link speed, too, both from ifconfig and the switch itself.

My current MBUF Usage is 1 % (17726/1010734).

IDS/IPS package is installed but disabled.

I had "Hardware CRC" and "Hardware TSO" and "Hardware LRO" and "VLAN Hardware Filtering" all enabled. I have since set those all to disabled and rebooted. I can confirm that it disabled by looking at the interface flags in ifconfig:

Pre-reboot:
options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

Post-reboot:
options=803828<VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC>

I ran top and was able to see a process (kernel{if_io_tqg_2}) utilize near 100% of a CPU core during this iperf3 test:

# top -aSH

last pid: 22772; load averages: 1.23, 0.94, 0.79 up 5+23:48:52 14:24:22
233 threads: 15 running, 193 sleeping, 25 waiting
CPU: 1.0% user, 0.0% nice, 16.1% system, 0.5% interrupt, 82.4% idle
Mem: 1485M Active, 297M Inact, 1657M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -76 - 0 848K CPU2 2 279:51 99.77% [kernel{if_io_tqg_2}]
11 root 155 ki31 0 192K CPU3 3 130.8H 98.78% [idle{idle: cpu3}]
11 root 155 ki31 0 192K CPU9 9 131.3H 98.75% [idle{idle: cpu9}]
11 root 155 ki31 0 192K CPU1 1 129.7H 98.68% [idle{idle: cpu1}]
11 root 155 ki31 0 192K CPU10 10 138.1H 98.33% [idle{idle: cpu10}]
11 root 155 ki31 0 192K CPU5 5 130.5H 97.51% [idle{idle: cpu5}]
11 root 155 ki31 0 192K CPU0 0 138.3H 95.78% [idle{idle: cpu0}]
11 root 155 ki31 0 192K CPU8 8 137.7H 95.25% [idle{idle: cpu8}]
11 root 155 ki31 0 192K CPU6 6 138.7H 95.20% [idle{idle: cpu6}]
11 root 155 ki31 0 192K CPU4 4 138.4H 94.26% [idle{idle: cpu4}]
22772 root 82 0 15M 6772K CPU7 7 0:04 93.83% iperf3 -c 192.168.1.31
11 root 155 ki31 0 192K RUN 7 129.4H 68.75% [idle{idle: cpu7}]
11 root 155 ki31 0 192K RUN 11 126.8H 46.12% [idle{idle: cpu11}]
0 root -76 - 0 848K - 4 277:00 5.12% [kernel{if_io_tqg_4}]
12 root -60 - 0 400K WAIT 11 449:21 5.02% [intr{swi4: clock (0)}]
0 root -76 - 0 848K - 8 317:40 3.81% [kernel{if_io_tqg_8}]
0 root -76 - 0 848K - 0 272:13 2.71% [kernel{if_io_tqg_0}]

I occasionally see flowd_aggregate.py pop up to 100% but it doesn't seem consistent or relevant to when iperf3 is running:

# top -aSH

last pid: 99781; load averages: 1.15, 0.90, 0.77 up 5+23:47:27 14:22:57
232 threads: 14 running, 193 sleeping, 25 waiting
CPU: 8.5% user, 0.0% nice, 1.6% system, 0.4% interrupt, 89.5% idle
Mem: 1481M Active, 299M Inact, 1656M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
43465 root 90 0 33M 25M CPU7 7 7:11 99.82% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)
11 root 155 ki31 0 192K CPU9 9 131.3H 99.80% [idle{idle: cpu9}]
11 root 155 ki31 0 192K CPU3 3 130.8H 99.68% [idle{idle: cpu3}]
11 root 155 ki31 0 192K CPU10 10 138.1H 99.50% [idle{idle: cpu10}]
11 root 155 ki31 0 192K CPU6 6 138.7H 98.53% [idle{idle: cpu6}]
11 root 155 ki31 0 192K RUN 5 130.5H 98.20% [idle{idle: cpu5}]
11 root 155 ki31 0 192K CPU1 1 129.7H 97.97% [idle{idle: cpu1}]
11 root 155 ki31 0 192K CPU11 11 126.8H 96.52% [idle{idle: cpu11}]
11 root 155 ki31 0 192K CPU0 0 138.3H 96.43% [idle{idle: cpu0}]
11 root 155 ki31 0 192K CPU8 8 137.7H 95.95% [idle{idle: cpu8}]
11 root 155 ki31 0 192K CPU2 2 138.3H 95.81% [idle{idle: cpu2}]
11 root 155 ki31 0 192K CPU4 4 138.4H 93.94% [idle{idle: cpu4}]
12 root -60 - 0 400K WAIT 4 449:17 5.10% [intr{swi4: clock (0)}]
0 root -76 - 0 848K - 4 276:55 4.95% [kernel{if_io_tqg_4}]

What is going on here?

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 26, 2020, 08:17:40 AM

To add to this, I re-configured all my VLANs on bge0 (onboard NIC) and moved all my interfaces over to each respective bge0_vlanX interface and re-ran my iperf3 tests.

On my first test, I got the same throughput as with my Intel X520-SR2 NIC:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[ 5] local 192.168.1.1 port 42455 connected to 192.168.1.31 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 92.0 MBytes 772 Mbits/sec 91 5.70 KBytes
[ 5] 1.00-2.00 sec 91.1 MBytes 764 Mbits/sec 88 145 KBytes
[ 5] 2.00-3.00 sec 86.1 MBytes 722 Mbits/sec 86 836 KBytes
[ 5] 3.00-4.00 sec 92.5 MBytes 776 Mbits/sec 76 589 KBytes
[ 5] 4.00-5.00 sec 107 MBytes 894 Mbits/sec 0 803 KBytes
[ 5] 5.00-6.00 sec 107 MBytes 898 Mbits/sec 2 731 KBytes
[ 5] 6.00-7.00 sec 109 MBytes 914 Mbits/sec 1 658 KBytes
[ 5] 7.00-8.00 sec 110 MBytes 926 Mbits/sec 0 863 KBytes
[ 5] 8.00-9.00 sec 107 MBytes 898 Mbits/sec 2 748 KBytes
[ 5] 9.00-10.00 sec 109 MBytes 918 Mbits/sec 1 663 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1011 MBytes 848 Mbits/sec 347 sender
[ 5] 0.00-10.32 sec 1010 MBytes 821 Mbits/sec receiver

For reference, I just tested with my MacBook Pro against the same iperf3 server and was able to push 926 Mbps and re-tested my QNAP to QNAP transfer and it did 9.39 Gbps to completely rule out it's an iperf3 server thing.

For the sake of testing because why not, I re-ran iperf3 from my OPNsense server once more and got near gigabit throughput:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[ 5] local 192.168.1.1 port 8283 connected to 192.168.1.31 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 108 MBytes 906 Mbits/sec 0 792 KBytes
[ 5] 1.00-2.00 sec 111 MBytes 932 Mbits/sec 2 698 KBytes
[ 5] 2.00-3.00 sec 111 MBytes 930 Mbits/sec 1 638 KBytes
[ 5] 3.00-4.00 sec 108 MBytes 905 Mbits/sec 1 585 KBytes
[ 5] 4.00-5.00 sec 111 MBytes 929 Mbits/sec 0 816 KBytes
[ 5] 5.00-6.00 sec 111 MBytes 929 Mbits/sec 1 776 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 928 Mbits/sec 1 725 KBytes
[ 5] 7.00-8.00 sec 108 MBytes 906 Mbits/sec 2 663 KBytes
[ 5] 8.00-9.00 sec 111 MBytes 928 Mbits/sec 2 616 KBytes
[ 5] 9.00-10.00 sec 111 MBytes 928 Mbits/sec 0 837 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.07 GBytes 922 Mbits/sec 10 sender
[ 5] 0.00-10.32 sec 1.07 GBytes 892 Mbits/sec receiver

One thing I noticed between the first and second iperf3 test was the "Retr" column of 347 vs 10. I researched what that meant for iperf3 and found this: "It's the number of TCP segments retransmitted. This can happen if TCP segments are lost in the network due to congestion or corruption."

I also noticed during my second iperf3 test that there was now a kernel process using 99.81% CPU:

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0 192K CPU3 3 9:02 100.00% [idle{idle: cpu3}]
0 root -92 - 0 848K CPU2 2 0:30 99.81% [kernel{bge0 taskq}]

Additionally, I am not sure "Retr" in itself is a smoking gun as the QNAP to QNAP test that yielded 9.39 Gbps did 2218 retries.

The search continues.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 26, 2020, 08:45:48 AM

I know that bge driver has problems with OPNsense but X520 should deliver fine performance.
I tested these cards with 20.7rc1 and got full wire speed.

I can run these tests again with latest 20.7.1 but I need to finish some other stuff first.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 26, 2020, 09:42:16 AM

I know that the Broadcom drivers aren't the best but I figured it was worth a test. That being said, I just swapped the Intel X520-SR2 with a Chelsio T540-CR which seems to have excellent FreeBSD support and that family of NICs seems frequently recommended.

Here's the results from the Chelsio T540-CR:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[ 5] local 192.168.1.1 port 19465 connected to 192.168.1.31 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 112 MBytes 943 Mbits/sec 0 8.00 MBytes
[ 5] 1.00-2.00 sec 110 MBytes 924 Mbits/sec 0 8.00 MBytes
[ 5] 2.00-3.00 sec 112 MBytes 939 Mbits/sec 0 8.00 MBytes
[ 5] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 0 8.00 MBytes
[ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 0 8.00 MBytes
[ 5] 5.00-6.00 sec 112 MBytes 939 Mbits/sec 0 8.00 MBytes
[ 5] 6.00-7.00 sec 112 MBytes 940 Mbits/sec 0 8.00 MBytes
[ 5] 7.00-8.00 sec 112 MBytes 938 Mbits/sec 0 8.00 MBytes
[ 5] 8.00-9.00 sec 112 MBytes 940 Mbits/sec 0 8.00 MBytes
[ 5] 9.00-10.00 sec 112 MBytes 940 Mbits/sec 0 8.00 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec 0 sender
[ 5] 0.00-10.32 sec 1.09 GBytes 909 Mbits/sec receiver

Also thought it was interesting there were zero retransmits on the test.

I swapped out the optic on the NIC when I swapped the NIC itself. I will swap the optic on the switch and maybe try a different switch port and fiber patch cable tomorrow, though, I doubt those are the issue.

Unfortunately, it appears that the issue was not my Intel X520-SR2 NIC as the Chelsio T540-CR exhibits the same behavior.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 27, 2020, 03:41:43 AM

Just a status update:

Swapped optics on the switch side (both have now been switched) and swapped for a new fiber patch cable. Same results. I also re-enabled "Hardware CRC" and "VLAN Hardware Filtering" but left "Hardware TSO" and "Hardware LRO" disabled as I read most drivers are broken for those functions.

I also added this to /boot/loader.conf.local and rebooted:

hw.cxgbe.toecaps_allowed=0
hw.cxgbe.rdmacaps_allowed=0
hw.cxgbe.iscsicaps_allowed=0
hw.cxgbe.fcoecaps_allowed=0

Absolutely zero impact in performance. Tomorrow I think I'll unbox my other PowerEdge R430 and put the original Intel X520-SR2 NIC in it and see if I can duplicate the problem.

I am at a total loss of what is going on here.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 28, 2020, 04:08:35 AM

OK so at the risk of seeming like I am only talking to myself at this point, I think I found a commonality amongst the poor performance -- it's OPNsense.

I built a fresh new and updated OPNsense 20.7.1 VM on VMware ESXi 6.7U3, imported my configuration backup from my physical server and re-mapped all the interfaces to the new vmx0_vlanX names and things are working, albeit even slower than the physical hardware:

root@opnsense1:~ # iperf3 -c 192.168.1.31
...
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 705 MBytes 591 Mbits/sec 0 sender
[ 5] 0.00-10.41 sec 705 MBytes 568 Mbits/sec receiver

Seems pretty awful. So I decided to create a two new OPNsense 20.7.1 VMs and configure one as a VLAN trunk and the other as non-trunk to test if the problem lied within the VLAN implementation itself:

OPNsense 20.7.1 (amd64)

VLAN and pf Enabled:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 949 MBytes 796 Mbits/sec 0 sender
[ 5] 0.00-10.40 sec 949 MBytes 766 Mbits/sec receiver

VLAN and pf Disabled (pfctl -d):
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 1.22 GBytes 1.05 Gbits/sec 0 sender
[ 5] 0.00-10.41 sec 1.22 GBytes 1.01 Gbits/sec receiver

Non-VLAN and pf Enabled:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 854 MBytes 716 Mbits/sec 0 sender
[ 5] 0.00-10.40 sec 854 MBytes 688 Mbits/sec receiver

Non-VLAN and pf Disabled (pfctl -d):
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 983 MBytes 825 Mbits/sec 0 sender
[ 5] 0.00-10.40 sec 983 MBytes 793 Mbits/sec receiver

As you can see, the VLAN trunk configured VM had slightly better performance. Perhaps environmental impacts caused the performance differences as I would expect them to be nearly the same. Even at the differences I'm seeing, I would consider it mostly negligible given the link is 10 gigabit. I also tested without pf to see if the throughput was measurable. Both tests show that it is in fact better without pf, though, kinda pointless to have a network perimeter firewall without it running...

Next I thought maybe this is just a fluke and all three OPNsense servers just suck on VMware ESXi and dislike the hardware or configuration or maybe my ESX host just can't push traffic. I had a CentOS 8.2.2004 VM already deployed and configured on the same network segment I had been testing on so I loaded up iperf3 on it to see if it was an ESX host/network problem.

CentOS 8.2.2004 (x86_64)

Non-VLAN and firewalld Enabled:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.7 GBytes 9.17 Gbits/sec 11 sender
[ 5] 0.00-10.04 sec 10.7 GBytes 9.14 Gbits/sec receiver

Non-VLAN and firewalld Disabled:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.8 GBytes 9.32 Gbits/sec 1 sender
[ 5] 0.00-10.04 sec 10.8 GBytes 9.28 Gbits/sec receiver

Tested with firewall on and off just for fun to see how much iptables slowed the Linux test down. As you can see, 9.14 Gbps to 9.32 Gbps on this test. The problem isn't my ESX host or my network.

I then thought it might be a BSD problem. Perhaps something with running inside VMware or the vmxnet3 driver that is problematic. I tried to figure out how to install HardenedBSD but it seemed too difficult difficult as my quick search for an ISO yielded not much. As such, I used FreeBSD. Hopefully it's close enough!

FreeBSD 12.1 (amd64)

VLAN and pf Disabled (not configured):
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.9 GBytes 9.35 Gbits/sec 0 sender
[ 5] 0.00-10.42 sec 10.9 GBytes 8.97 Gbits/sec receiver

Non-VLAN and pf Disabled (not configured):
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.9 GBytes 9.36 Gbits/sec 13 sender
[ 5] 0.00-10.21 sec 10.9 GBytes 9.17 Gbits/sec receiver

I thought I hadn't spent enough time already dorking around with this so why not configure one test VM to be VLAN trunking and the other not to see if there are any differences. As you can see, FreeBSD 12.1 pushed the packets, fast, regardless of VLAN or otherwise. Problem doesn't seem to be vmxnet3/ESXi and FreeBSD related.

Finally, I came to the conclusion that maybe OPNsense 20.7 is just broken. As such, I loaded up a OPNsense 19.7 test VM and gave it a go.

OPNsense 19.7.10_1 (amd64)

Non-VLAN and pf Enabled:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.75 GBytes 1.50 Gbits/sec 0 sender
[ 5] 0.00-10.44 sec 1.75 GBytes 1.44 Gbits/sec receiver

Non-VLAN and pf Disabled (pfctl -d):
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.57 GBytes 2.21 Gbits/sec 0 sender
[ 5] 0.00-10.48 sec 2.57 GBytes 2.11 Gbits/sec receiver

Not good. You can see the results of 1.75 Gbps to 2.57 Gbps is measurably better than my test results with OPNsense 20.7 but nowhere near stellar. I was very much over testing at this point so I opted not to do a VLAN versus non-VLAN configuration. That being said, based on historical results, I am sure that the difference in results would have been negligible.

To add to this, as a general observation, whenever the iperf3 test is running on OPNsense, a constant ping of the firewall starts to drop packets like it is choked out and cannot keep up. I did not experience this at all on CentOS or FreeBSD when testing.

Why is OPNsense so bad at throughput in my tests? If it's not, what am I doing wrong? The commonality amongst these tests seems to be OPNsense, regardless if it's 19.7 or 20.7, though, the former is better than the later.

Edit: Because why not at this point. Let's test pfSense!

pfSense 2.4.5 (amd64)

Non-VLAN and pf Enabled:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 3.80 GBytes 3.26 Gbits/sec 67 sender
[ 5] 0.00-10.26 sec 3.80 GBytes 3.18 Gbits/sec receiver

Non-VLAN and pf Disabled (pfctl -d):
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.66 GBytes 4.86 Gbits/sec 109 sender
[ 5] 0.00-10.22 sec 5.66 GBytes 4.76 Gbits/sec receiver

pfSense is not stellar, especially considering it is based on FreeBSD 12.1 and I tested FreeBSD 12.1 and got very different (better) results. That being said, both results are much, much faster than any OPNsense test I could push regardless if physical or virtual.

Edit 2: Fixed a typo in my comments where I erroneously used 20.1 instead of 20.7 when referring to editions of OPNsense.

TL;DR: OPNsense seems to be dog slow compared to FreeBSD 12.1 and CentOS 8.2 at raw network throughput. What gives? What am I doing wrong that it can be this huge of a performance gap?

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: biomatrix on August 28, 2020, 04:39:55 AM

your testing is amazing -
I have nothing to add (there are actually 2 other threads with this same subject matter - various reason, but we're slow)

I am posting to let you know, there are others and you arn't just talking to yourself

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 28, 2020, 07:06:03 AM

Quote from: hax0rwax0r on August 28, 2020, 04:08:35 AM
What am I doing wrong that it can be this huge of a performance gap?

The problem is, your are not testing traffic *through* the firewall, you are measuring *against* the firewall.
iperf3 on OPNsense operates really bad. Can you test sender and receiver on different interfaces?

Again, I'm doing regular performance tests with hardware details and I'm always near wirespeed:
https://www.routerperformance.net/opnsense/opnsense-performance-20-1-8/
https://www.routerperformance.net/routers/nexcom-nsa/fujitsu-rx1330/
https://www.routerperformance.net/routers/nexcom-nsa/thomas-krenn-ri1102d/

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 28, 2020, 07:23:32 AM

OK, I upgraded my lab now:

Client1: Ubuntu
FW1: 20.7.1 (Intel(R) Xeon(R) CPU E3-1240 v6 @ 3.70GHz (8 cores))
FW2: 20.7
Client2: Ubuntu

They are directly attached via TwinAx cables and a mix of Intel X520 and Mellanoc Connect-X3.

Client1 is iperf client, Client2 is iperf server:

With IPS enabled, 1 stream:

root@px3:~# iperf3 -p 5000 -f m -V -c 10.2.0.10 -P 1 -t 10 -R
iperf 3.1.3
Linux px3 4.15.18-12-pve #1 SMP PVE 4.15.18-35 (Wed, 13 Mar 2019 08:24:42 +0100) x86_64
Time: Fri, 28 Aug 2020 05:17:13 GMT
Connecting to host 10.2.0.10, port 5000
Reverse mode, remote host 10.2.0.10 is sending
Cookie: px3.1598591833.837625.6814fda03553a5
TCP MSS: 1448 (default)
[ 4] local 10.1.0.10 port 58842 connected to 10.2.0.10 port 5000
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 159 MBytes 1335 Mbits/sec
[ 4] 1.00-2.00 sec 159 MBytes 1335 Mbits/sec
[ 4] 2.00-3.00 sec 156 MBytes 1308 Mbits/sec
[ 4] 3.00-4.00 sec 156 MBytes 1305 Mbits/sec
[ 4] 4.00-5.00 sec 157 MBytes 1313 Mbits/sec
[ 4] 5.00-6.00 sec 157 MBytes 1315 Mbits/sec
[ 4] 6.00-7.00 sec 156 MBytes 1309 Mbits/sec
[ 4] 7.00-8.00 sec 157 MBytes 1319 Mbits/sec
[ 4] 8.00-9.00 sec 155 MBytes 1298 Mbits/sec
[ 4] 9.00-10.00 sec 155 MBytes 1301 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.53 GBytes 1316 Mbits/sec 39 sender
[ 4] 0.00-10.00 sec 1.53 GBytes 1315 Mbits/sec receiver
CPU Utilization: local/receiver 63.0% (8.2%u/54.8%s), remote/sender 0.2% (0.0%u/0.2%s)

iperf Done.

Without IPS, 1 stream:

root@px3:~# iperf3 -p 5000 -f m -V -c 10.2.0.10 -P 1 -t 10 -R
iperf 3.1.3
Linux px3 4.15.18-12-pve #1 SMP PVE 4.15.18-35 (Wed, 13 Mar 2019 08:24:42 +0100) x86_64
Time: Fri, 28 Aug 2020 05:18:46 GMT
Connecting to host 10.2.0.10, port 5000
Reverse mode, remote host 10.2.0.10 is sending
Cookie: px3.1598591926.454562.6f7931ec23f094
TCP MSS: 1448 (default)
[ 4] local 10.1.0.10 port 58846 connected to 10.2.0.10 port 5000
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 800 MBytes 6708 Mbits/sec
[ 4] 1.00-2.00 sec 816 MBytes 6844 Mbits/sec
[ 4] 2.00-3.00 sec 814 MBytes 6830 Mbits/sec
[ 4] 3.00-4.00 sec 814 MBytes 6829 Mbits/sec
[ 4] 4.00-5.00 sec 816 MBytes 6844 Mbits/sec
[ 4] 5.00-6.00 sec 816 MBytes 6844 Mbits/sec
[ 4] 6.00-7.00 sec 815 MBytes 6840 Mbits/sec
[ 4] 7.00-8.00 sec 816 MBytes 6840 Mbits/sec
[ 4] 8.00-9.00 sec 815 MBytes 6841 Mbits/sec
[ 4] 9.00-10.00 sec 816 MBytes 6841 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 7.95 GBytes 6829 Mbits/sec 36 sender
[ 4] 0.00-10.00 sec 7.95 GBytes 6826 Mbits/sec receiver
CPU Utilization: local/receiver 28.7% (1.2%u/27.5%s), remote/sender 1.2% (0.0%u/1.2%s)

iperf Done.

Without IPS, 10 parallel streams:

[ 4] 3.00-3.90 sec 106 MBytes 992 Mbits/sec
[ 6] 3.00-3.90 sec 105 MBytes 981 Mbits/sec
[ 8] 3.00-3.90 sec 71.7 MBytes 669 Mbits/sec
[ 10] 3.00-3.90 sec 69.8 MBytes 651 Mbits/sec
[ 12] 3.00-3.90 sec 73.6 MBytes 686 Mbits/sec
[ 14] 3.00-3.90 sec 97.8 MBytes 912 Mbits/sec
[ 16] 3.00-3.90 sec 101 MBytes 941 Mbits/sec
[ 18] 3.00-3.90 sec 80.4 MBytes 750 Mbits/sec
[ 20] 3.00-3.90 sec 137 MBytes 1279 Mbits/sec
[ 22] 3.00-3.90 sec 163 MBytes 1523 Mbits/sec
[SUM] 3.00-3.90 sec 1006 MBytes 9383 Mbits/sec

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 28, 2020, 07:45:31 AM

I mean of course running a parallel test is going to yield better results if the firewall has multi-core CPU(s) and you are maxing out a CPU core.

The issue I have is that that single threaded throughput is only about 850 Mbps on my non-virtualized hardware. That seems not right to me but I only know my environment so I might just be wrong.

And yes, I did test through the firewall before I started doing tests from the firewall. Through the firewall nets me similar performance for single threaded:

[root@client1 ~]# iperf3 -f m -c 192.168.1.31
...
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 973 MBytes 816 Mbits/sec 22 sender
[ 4] 0.00-10.00 sec 970 MBytes 814 Mbits/sec receiver

And, as expected, increased throughput when running in parallel:

[root@client1 ~]# iperf3 -f m -c 192.168.1.31 -P 10
...
[ ID] Interval Transfer Bandwidth Retr
...
[SUM] 0.00-10.00 sec 3.26 GBytes 2798 Mbits/sec 4464 sender
[SUM] 0.00-10.00 sec 3.23 GBytes 2776 Mbits/sec receiver

~~Can you humor me and run a single threaded test through your hardware and show me the output?~~

If OPNsense is truly not broken in this release then I guess my CPU core speed isn't enough to achieve what I am looking to do and I need to look on eBay for a faster one. That being said, it appears there are several others reporting degraded performance since upgrading so maybe there is something to my claim.

Edit: I see your single threaded non-IPS throughput is 6826 Mbps. See, even your single threaded test absolutely crushes mine. I get that your CPU is @ 3.7 GHz and a v6 but really, almost 7 Gbps versus less than my 1 Gbps. I have a v3 Xeon that has a higher clock rate (maybe 3.2 GHz?) I can try to test out tomorrow to see what results I get.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 28, 2020, 08:16:33 AM

Can you also test with pfsense 2.5.0dev since this is based on 12, as 2.4.5 runs FreeBSD 11

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 28, 2020, 09:08:12 AM

Fresh install of OPNsense 20.7 on a Dell T20 (Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (4 cores)):

[root@client1 ~]# iperf3 -c 192.168.1.31
...
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 8.29 GBytes 7.12 Gbits/sec 2 sender
[ 4] 0.00-10.00 sec 8.29 GBytes 7.12 Gbits/sec receiver

[root@client1 ~]# iperf3 -c 192.168.1.31 -P 10
...
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.01 sec 8.77 GBytes 7.53 Gbits/sec 139 sender
[SUM] 0.00-10.01 sec 8.77 GBytes 7.53 Gbits/sec receiver

It's just hard to believe that E3-1225 v3 @ 2.4GHz/3.2GHz versus an E5-2620 v3 3.2GHz/3.6GHz is that much difference for a single thread test; however, it's clear, the results don't lie. There's either something wrong with my hardware, my install or it's just too slow of a CPU to push single threaded performance past about 850 Mbps.

And you're right about the pfSense version of FreeBSD. I just double checked the page (https://docs.netgate.com/pfsense/en/latest/releases/versions-of-pfsense-and-freebsd.html) and, in spite of it being clearly marked 2.5.0 TBD, I didn't even pay attention that it definitely was not the edition I installed.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 28, 2020, 12:36:42 PM

FreeBSD is known to be not so performant compared to Linux in single stream, esp with pppoe, but your problem is weird. Sadly No other hardware to test

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 31, 2020, 08:17:06 AM

I built a new OPNsense server on my spare Dell PowerEdge R430 server that has the same CPU in it as my one I am currently using.

I can confirm that the problem appears to be my CPU and/or hardware since the same exact NIC was moved from the Dell PowerEdge T20 which previously tested out at 7.53 Gbps to this R430 server and the test results are much lower:

[root@client1 ~]# iperf3 -c 192.168.1.31
...
[ 4] 0.00-10.00 sec 2.13 GBytes 1.83 Gbits/sec 72 sender
[ 4] 0.00-10.00 sec 2.13 GBytes 1.83 Gbits/sec receiver

[root@client1 ~]# iperf3 -c 192.168.1.31 -P 10
...
[SUM] 0.00-10.00 sec 4.78 GBytes 4.10 Gbits/sec 1143 sender
[SUM] 0.00-10.00 sec 4.75 GBytes 4.08 Gbits/sec receiver

One observation is on like-for-like hardware, the new R430 is performing more than double the throughput on the single thread test and more than a gigabit more on parallel test than my currently used R430 I have been experiencing problems with. No idea why this is.

I guess I have a decision to make about buying a new CPU or a new server.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 02, 2020, 07:34:01 AM

OK, back to basics here. I couldn't leave well enough alone and I did more testing tonight because I just couldn't believe that my CPU couldn't even do single threaded gigabit. Here's my test scenario:

Test Scenario 1:

Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
Dell PowerEdge R430 w/Intel X520-SR2 and HardenedBSD 12-STABLE (BUILD-LATEST 2020-08-31)

Single Threaded:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.00 GBytes 863 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 1.00 GBytes 860 Mbits/sec receiver

6 Parallel Threads:
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.00 sec 2.23 GBytes 1.91 Gbits/sec 938 sender
[SUM] 0.00-10.00 sec 2.22 GBytes 1.90 Gbits/sec receiver

Notice a common theme here with the ~850 Mbps single threaded test. It's pretty close to what I get with OPNsense. Note this is THROUGH the firewall and not from the firewall. Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

Test Scenario 2:

Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
Dell PowerEdge R430 w/Intel X520-SR2 and FreeBSD 12.1-RELEASE

Single Threaded:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 9.75 GBytes 8.38 Gbits/sec 573 sender
[ 4] 0.00-10.00 sec 9.75 GBytes 8.38 Gbits/sec receiver

6 Parallel Threads:
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.00 sec 10.5 GBytes 9.05 Gbits/sec 3607 sender
[SUM] 0.00-10.00 sec 10.5 GBytes 9.04 Gbits/sec receiver

I couldn't believe my eyes as I had to do a triple check that it was in fact pushing 8.38 Gbps THROUGH the FreeBSD 12.1 server and it wasn't taking some magical alternate path somehow. It was, in fact, going through the FreeBSD router. As you can see, parallel test is about 1 Gbps less than wire speed. Excellent! Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

I thought I would enable pfctl on the FreeBSD 12.1 router to see how that affected performance. Not sure how much adding rules impacts throughput but I did notice a measurable drop in the single thread test (6.23 Gbps) but the parallel thread test was negligible (8.94 Gbps).

As of right now, it seems so so so strange to me that HardenedBSD exhibits the same exact single threaded throughput and likewise low parallel thread throughput over FreeBSD.

I am willing to accept that I am not accounting for something here; however, near wire speed throughput on the same exact hardware on FreeBSD versus HardenedBSD, it seems to me something is very different with HardenedBSD.

What are your thoughts?

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on September 02, 2020, 09:04:55 AM

I am seeing very slow throughput on pfsense as well using Iperf3 online.

Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
16 CPUs: 2 package(s) x 8 core(s)
AES-NI CPU Crypto: Yes (inactive)

Using Suricata and cant get more then 200 mbps... pretty annoying.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 02, 2020, 10:53:01 AM

Ok, so we have an upstream problem with FreeBSD and a some chances to get them fixed the next months.
So the interim solution for now is to go

a) go back to 20.1
b) disable netmap (IPS/Sensei)
c) accept the lowered performance

I had a talk to Franco yesterday, there are some promising patches awaiting and we sure need some testers, so if one not going back to 20.1, this would be fine

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on September 02, 2020, 10:58:28 AM

Wasnt the problem OPN/pfsense instead of FreeBSD? Didnt the 10gbit tests show wirespeed on a FreeBSD machine using pf?

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 02, 2020, 11:00:14 AM

No, OPNsense 20.7 and pfSense 2.5 are using FreeBSD 12.X; 20.1 and pf 2.4 FreeBSD 11.X

With FreeBSD12 interface/networking stack was changed to iflib, which has known problems with netmap, where ppl. are already working on it.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on September 02, 2020, 11:12:45 AM

@minimugmail

Quote from: hax0rwax0r on September 02, 2020, 07:34:01 AM
OK, back to basics here. I couldn't leave well enough alone and I did more testing tonight because I just couldn't believe that my CPU couldn't even do single threaded gigabit. Here's my test scenario:

Test Scenario 1:

Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
Dell PowerEdge R430 w/Intel X520-SR2 and HardenedBSD 12-STABLE (BUILD-LATEST 2020-08-31)

Single Threaded:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.00 GBytes 863 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 1.00 GBytes 860 Mbits/sec receiver

6 Parallel Threads:
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.00 sec 2.23 GBytes 1.91 Gbits/sec 938 sender
[SUM] 0.00-10.00 sec 2.22 GBytes 1.90 Gbits/sec receiver

Notice a common theme here with the ~850 Mbps single threaded test. It's pretty close to what I get with OPNsense. Note this is THROUGH the firewall and not from the firewall. Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

Test Scenario 2:

Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
Dell PowerEdge R430 w/Intel X520-SR2 and FreeBSD 12.1-RELEASE

Single Threaded:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 9.75 GBytes 8.38 Gbits/sec 573 sender
[ 4] 0.00-10.00 sec 9.75 GBytes 8.38 Gbits/sec receiver

6 Parallel Threads:
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.00 sec 10.5 GBytes 9.05 Gbits/sec 3607 sender
[SUM] 0.00-10.00 sec 10.5 GBytes 9.04 Gbits/sec receiver

I couldn't believe my eyes as I had to do a triple check that it was in fact pushing 8.38 Gbps THROUGH the FreeBSD 12.1 server and it wasn't taking some magical alternate path somehow. It was, in fact, going through the FreeBSD router. As you can see, parallel test is about 1 Gbps less than wire speed. Excellent! Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

I thought I would enable pfctl on the FreeBSD 12.1 router to see how that affected performance. Not sure how much adding rules impacts throughput but I did notice a measurable drop in the single thread test (6.23 Gbps) but the parallel thread test was negligible (8.94 Gbps).

As of right now, it seems so so so strange to me that HardenedBSD exhibits the same exact single threaded throughput and likewise low parallel thread throughput over FreeBSD.

I am willing to accept that I am not accounting for something here; however, near wire speed throughput on the same exact hardware on FreeBSD versus HardenedBSD, it seems to me something is very different with HardenedBSD.

What are your thoughts?

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: franco on September 02, 2020, 11:57:56 AM

@hax0rwax0r

Try to repeat the FreeBSD 12.1-RELEASE test with our kernel instead of the stock one. I don't expect any differences.

https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/sets/kernel-20.7.2-amd64.txz

Cheers,
Franco

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AdSchellevis on September 02, 2020, 12:08:46 PM

Details matter (a lot) in these cases, we haven't seen huge differences on our end (apart from netmap issues with certain cards, which we don't ship ourselves). That being said, IPS is a feature that really stresses your hardware, quite some setups are not able to do more than 200Mbps mentioned in this thread.

Please be advised that HardenedBSD 12-STABLE isn't the same as OPNsense 20.7, the differences between OPNsense 20.7 src and freebsd are a bit smaller, but if you're convinced your issues lies with HardenedBSD's additions it might be good starting point (and a plain install has less features enabled).

You can always try to install our kernel on the same FreeBSD install which worked without issues (as Franco suggested), it could help reproducing steps more easily.

If you want to compare between HBSD and FBSD anyway, always make sure your comparing apples with apples, check interface settings, build options and tunables (sysctl -a). Testing between interfaces (not vlan's on the same) is probably easier so you know for sure traffic is only flowing once through the physical interface.

In case someone would like to reproduce your test, make sure to document step by step how one could do that (including network segments used).

Best regards,

Ad

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 02, 2020, 12:40:01 PM

Quote from: Supermule on September 02, 2020, 11:12:45 AM
@minimugmail

Quote from: hax0rwax0r on September 02, 2020, 07:34:01 AM
OK, back to basics here. I couldn't leave well enough alone and I did more testing tonight because I just couldn't believe that my CPU couldn't even do single threaded gigabit. Here's my test scenario:

Test Scenario 1:

Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
Dell PowerEdge R430 w/Intel X520-SR2 and HardenedBSD 12-STABLE (BUILD-LATEST 2020-08-31)

Single Threaded:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.00 GBytes 863 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 1.00 GBytes 860 Mbits/sec receiver

6 Parallel Threads:
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.00 sec 2.23 GBytes 1.91 Gbits/sec 938 sender
[SUM] 0.00-10.00 sec 2.22 GBytes 1.90 Gbits/sec receiver

Notice a common theme here with the ~850 Mbps single threaded test. It's pretty close to what I get with OPNsense. Note this is THROUGH the firewall and not from the firewall. Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

Test Scenario 2:

Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
Dell PowerEdge R430 w/Intel X520-SR2 and FreeBSD 12.1-RELEASE

Single Threaded:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 9.75 GBytes 8.38 Gbits/sec 573 sender
[ 4] 0.00-10.00 sec 9.75 GBytes 8.38 Gbits/sec receiver

6 Parallel Threads:
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.00 sec 10.5 GBytes 9.05 Gbits/sec 3607 sender
[SUM] 0.00-10.00 sec 10.5 GBytes 9.04 Gbits/sec receiver

I couldn't believe my eyes as I had to do a triple check that it was in fact pushing 8.38 Gbps THROUGH the FreeBSD 12.1 server and it wasn't taking some magical alternate path somehow. It was, in fact, going through the FreeBSD router. As you can see, parallel test is about 1 Gbps less than wire speed. Excellent! Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

I thought I would enable pfctl on the FreeBSD 12.1 router to see how that affected performance. Not sure how much adding rules impacts throughput but I did notice a measurable drop in the single thread test (6.23 Gbps) but the parallel thread test was negligible (8.94 Gbps).

As of right now, it seems so so so strange to me that HardenedBSD exhibits the same exact single threaded throughput and likewise low parallel thread throughput over FreeBSD.

I am willing to accept that I am not accounting for something here; however, near wire speed throughput on the same exact hardware on FreeBSD versus HardenedBSD, it seems to me something is very different with HardenedBSD.

What are your thoughts?

I have the same values with 20.7 on SuperMicro Hardware with Xeon and X520 as posted before. It's something in your hardware

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 02, 2020, 05:12:22 PM

I am not super familiar with FreeBSD so how would I go about swapping your kernel in for the existing stock FreeBSD 12.1 one I am running? I searched around on Google and I found how to build a customer kernel from source but this txz file you linked appears to be already compiled so I don't think that's what I want to do.

I also found reference to pkg-static to install locally downloaded packages but wanted to get some initial guidance before totally hosing this up.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 02, 2020, 05:21:00 PM

This should also be the same kernel which gets installed with latest 20.7.2

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 02, 2020, 07:25:58 PM

Oh, I guess I misunderstood franco's instructions I thought they were asking me to drop the 20.7.2 kernel linked on top/in place on my FreeBSD 12.1 install which I was asking how exactly to do that.

I think with your clarification and re-reading the post, franco was just asking me to try an install of 20.7.2, which happens to be running that kernel, and re-run my tests to see if it improves.

If that's the case, I will try and report back my findings with OPNsense 20.7.2.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: franco on September 02, 2020, 08:37:59 PM

No I did mean FreeBSD 12.1 with our kernel. All the networking is in the kernel so we will see if this is OPNsense vs. HBSD vs. FBSD or some sort of tweaking effort.

# fetch https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/sets/kernel-20.7.2-amd64.txz
# mv /boot/kernel /boot/kernel.old
# tar -C / -xf kernel-20.7.2-amd64.txz
# kldxref /boot/kernel

It should have a new /boot/kernel now and a reboot should activate it. You can compare build info after the system is back up.

# uname -rv
12.1-RELEASE-p8-HBSD FreeBSD 12.1-RELEASE-p8-HBSD #0 b3665671c4d(stable/20.7)-dirty: Thu Aug 27 05:58:53 CEST 2020 root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP

Cheers,
Franco

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 02, 2020, 10:40:50 PM

OK here are the test results as you requested:

FreeBSD 12.1 (pf enabled):

[root@fbsd1 ~]# uname -rv
12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC

[root@fbsd1 ~]# top -aSH
last pid: 2954; load averages: 0.44, 0.42, 0.41 up 0+01:38:55 20:13:46
132 threads: 10 running, 104 sleeping, 18 waiting
CPU: 0.0% user, 0.0% nice, 19.7% system, 5.2% interrupt, 75.1% idle
Mem: 10M Active, 6100K Inact, 271M Wired, 21M Buf, 39G Free
Swap: 3968M Total, 3968M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0 96K RUN 5 94:58 95.25% [idle{idle: cpu5}]
11 root 155 ki31 0 96K CPU1 1 93:26 83.69% [idle{idle: cpu1}]
11 root 155 ki31 0 96K RUN 0 94:44 73.68% [idle{idle: cpu0}]
11 root 155 ki31 0 96K CPU4 4 93:15 72.51% [idle{idle: cpu4}]
11 root 155 ki31 0 96K CPU3 3 93:36 64.80% [idle{idle: cpu3}]
11 root 155 ki31 0 96K RUN 2 92:55 62.29% [idle{idle: cpu2}]
0 root -76 - 0 480K CPU2 2 0:05 34.76% [kernel{if_io_tqg_2}]
0 root -76 - 0 480K CPU3 3 0:14 33.49% [kernel{if_io_tqg_3}]
12 root -52 - 0 304K CPU0 0 26:23 29.62% [intr{swi6: task queue}]
0 root -76 - 0 480K - 4 0:05 23.31% [kernel{if_io_tqg_4}]
0 root -76 - 0 480K - 0 0:05 12.31% [kernel{if_io_tqg_0}]
0 root -76 - 0 480K - 1 0:04 10.01% [kernel{if_io_tqg_1}]
12 root -88 - 0 304K WAIT 5 3:55 2.28% [intr{irq264: mfi0}]
0 root -76 - 0 480K - 5 0:06 1.88% [kernel{if_io_tqg_5}]
2954 root 20 0 13M 3676K CPU5 5 0:00 0.02% top -aSH
12 root -60 - 0 304K WAIT 0 0:01 0.01% [intr{swi4: clock (0)}]
0 root -76 - 0 480K - 4 0:02 0.01% [kernel{if_config_tqg_0}]

Single Thread:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 8.45 GBytes 7.26 Gbits/sec 802 sender
[ 4] 0.00-10.00 sec 8.45 GBytes 7.26 Gbits/sec receiver

10 Threads:
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.00 sec 9.85 GBytes 8.46 Gbits/sec 2991 sender
[SUM] 0.00-10.00 sec 9.83 GBytes 8.45 Gbits/sec receiver

FreeBSD 12.1 with OPNsense Kernel (pf enabled):

[root@fbsd1 ~]# uname -rv
12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC

[root@fbsd1 ~]# fetch https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/sets/kernel-20.7.2-amd64.txz
[root@fbsd1 ~]# mv /boot/kernel /boot/kernel.old
[root@fbsd1 ~]# tar -C / -xf kernel-20.7.2-amd64.txz
[root@fbsd1 ~]# kldxref /boot/kernel
[root@fbsd1 ~]# reboot

[root@fbsd1 ~]# uname -rv
12.1-RELEASE-p8-HBSD FreeBSD 12.1-RELEASE-p8-HBSD #0 b3665671c4d(stable/20.7)-dirty: Thu Aug 27 05:58:53 CEST 2020 root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP

[root@fbsd1 ~]# top -aSH
last pid: 43891; load averages: 0.99, 0.49, 0.20 up 0+00:04:28 20:29:24
131 threads: 13 running, 100 sleeping, 18 waiting
CPU: 0.0% user, 0.0% nice, 62.5% system, 3.5% interrupt, 33.9% idle
Mem: 14M Active, 1184K Inact, 270M Wired, 21M Buf, 39G Free
Swap: 3968M Total, 3968M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -76 - 0 480K CPU3 3 0:08 81.27% [kernel{if_io_tqg_3}]
0 root -76 - 0 480K CPU1 1 0:09 74.39% [kernel{if_io_tqg_1}]
0 root -76 - 0 480K CPU5 5 0:08 73.20% [kernel{if_io_tqg_5}]
0 root -76 - 0 480K CPU0 0 0:21 71.79% [kernel{if_io_tqg_0}]
11 root 155 ki31 0 96K RUN 4 4:09 54.15% [idle{idle: cpu4}]
11 root 155 ki31 0 96K RUN 2 4:09 51.30% [idle{idle: cpu2}]
0 root -76 - 0 480K CPU2 2 0:05 40.10% [kernel{if_io_tqg_2}]
0 root -76 - 0 480K - 4 0:09 37.60% [kernel{if_io_tqg_4}]
11 root 155 ki31 0 96K RUN 0 4:03 26.48% [idle{idle: cpu0}]
11 root 155 ki31 0 96K RUN 5 4:14 25.87% [idle{idle: cpu5}]
11 root 155 ki31 0 96K RUN 1 4:09 24.32% [idle{idle: cpu1}]
12 root -52 - 0 304K RUN 2 1:12 20.63% [intr{swi6: task queue}]
11 root 155 ki31 0 96K CPU3 3 4:00 17.30% [idle{idle: cpu3}]
12 root -88 - 0 304K WAIT 5 0:10 1.47% [intr{irq264: mfi0}]
43891 root 20 0 13M 3660K CPU4 4 0:00 0.03% top -aSH
21 root -16 - 0 16K - 4 0:00 0.02% [rand_harvestq]
12 root -60 - 0 304K WAIT 1 0:00 0.02% [intr{swi4: clock (0)}]

Single Thread:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 2.89 GBytes 2.48 Gbits/sec 0 sender
[ 4] 0.00-10.00 sec 2.89 GBytes 2.48 Gbits/sec receiver

10 Threads:
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-10.00 sec 8.16 GBytes 7.01 Gbits/sec 4260 sender
[SUM] 0.00-10.00 sec 8.13 GBytes 6.98 Gbits/sec receiver

I included the "top -aSH" output again because my general observation between OPNsense kernel and FreeBSD 12.1 stock kernel is the "[kernel{if_io_tqg_X}]" process usage. Even on an actual OPNsense 20.7.2 installation I notice the exact same behavior of the "[kernel{if_io_tqg_X}]" being consistently higher and throughput significantly slower, specifically on single threaded tests. Note that both of the top outputs were only from the 10 thread count tests only as I did not think to capture them during the single threaded test.

I can't help but think that whatever high "[kernel{if_io_tqg_X}]" on the OPNsense kernel means is starving the system of throughput potential.

Thoughts? Next steps I can run and provide results from?

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on September 02, 2020, 11:37:08 PM

Just wanted to post here due to the excellent testing from OP and to corroborate the numbers that OP is seeing.

My testing setup is as follows:
ESXi 6.7u3, host has an E3 1220v3 and 32GB of RAM
All Firewall VMs have 2vCPU. 5GB of RAM allocated to OPNsense.
VMXnet3 NICs negotiated at 10gbps

In pfSense and OPNsense, I disabled all of the hardware offloading features. I am using client and server VMs on the WAN and LAN sides of the firewall VMs. This means I am pushing/pulling traffic through the firewalls, I am not running iperf directly on any of the firewalls themselves. Because I am doing this on a single ESXi host and the traffic is within the same host/vSwitch, the traffic is never routed to my physical network switch and therefore I can test higher throughput.

pfSense and OPNsense were both out of the box installs with their default rulesets. I did not add any packages or make any config changes outside of making sure that all hardware offloading was disabled. All iperf3 tests were run with the LAN side client pulling traffic through the WAN side interface, to simulate a large download. However, if I perform upload tests, my throughput results are the same. All iperf3 tests were run for 60 seconds and used the default MTU of 1500. The results below show the average of the 60 second runs. I ran each test twice, and used the final result to allow the firewalls to "warm up" and stabilize with their throughput during testing.

Routing device	Client --> Server	Server --> Client
a) OPNsense	67,3	71,2
b) Ubuntu	108,7	113,8

Direction	IPsec enabled	IPsec disabled
Server -> OPnsense -> Client	48.1 MB/s	74.2 MB/s
Server <- OPnsense <- Client	49.9 MB/s	61.1 MB/s

interrupt	total	rate
irq51: ix2:rxq0	5136	11
irq52: ix2:rxq1	2176474	4708
irq53: ix2:rxq2	7203	16
irq54: ix2:rxq3	3299471	7138
irq55: ix2:aq	1	0

OPNsense Forum

English Forums => Hardware and Performance => Topic started by: hax0rwax0r on August 25, 2020, 08:31:25 PM