OPNsense Forum

English Forums => Hardware and Performance => Topic started by: hax0rwax0r on August 25, 2020, 08:31:25 pm

Title: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 25, 2020, 08:31:25 pm
I originally posted on Reddit but figured I might get more traction here with this.

I have an OPNsense 20.7.1 server running on a Dell R430 with 16 GB DDR4 RAM, an Intel Xeon E5-2620 v3 (6 cores/12 threads @ 2.40GHz) CPU and an Intel X520-SR2 10GbE NIC.

My network has several VLANs and network subnets with my OPNsense router functioning as a router on a stick doing all the traffic firewalling and routing between each network segment.

I recently upgraded my OPNsense to 20.7.1 and on a whim decided to run an iperf3 test between two VMs on different network segments to see what kind of throughput I was getting. I am certain, at least at some point, this very same hardware pushed over 6 Gbps on the same iperf3 test. Today it was getting around 850 Mbps every single time.

I started iperf3 as a server on my QNAP NAS device which is also attached to the same 10 Gbps switch and ran iperf3 as a client from OPNsense on the same network segment and got the same 850 Mbps throughput.

To make sure I wasn't limited by the QNAP NAS device, I ran the same iperf3 test with my other QNAP NAS device as a client to the first QNAP NAS device and it pushed 8.6 Gbps across the same network segment (no OPNsense involved) so both the QNAP and the switch can push it.

My question is what do I have going wrong here? Even the same network segment, OPNsense can't do more than 850 Mbps throughput. I have no idea if this was happening pre-upgrade to 20.7.1 but I know for sure it is happening now. I would assume an iperf3 test from the OPNsense server on the same network segment would surely remove any doubt it was firewalling, etc.

The interface shows 10 Gbps link speed, too, both from ifconfig and the switch itself.

My current MBUF Usage is 1 % (17726/1010734).

IDS/IPS package is installed but disabled.

I had "Hardware CRC" and "Hardware TSO" and "Hardware LRO" and "VLAN Hardware Filtering" all enabled. I have since set those all to disabled and rebooted. I can confirm that it disabled by looking at the interface flags in ifconfig:

Pre-reboot:
options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

Post-reboot:
options=803828<VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC>

I ran top and was able to see a process (kernel{if_io_tqg_2}) utilize near 100% of a CPU core during this iperf3 test:

# top -aSH

last pid: 22772;  load averages:  1.23,  0.94,  0.79                                                                                                                                                                      up 5+23:48:52  14:24:22
233 threads:   15 running, 193 sleeping, 25 waiting
CPU:  1.0% user,  0.0% nice, 16.1% system,  0.5% interrupt, 82.4% idle
Mem: 1485M Active, 297M Inact, 1657M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root        -76    -      0   848K CPU2     2 279:51  99.77% [kernel{if_io_tqg_2}]
   11 root        155 ki31      0   192K CPU3     3 130.8H  98.78% [idle{idle: cpu3}]
   11 root        155 ki31      0   192K CPU9     9 131.3H  98.75% [idle{idle: cpu9}]
   11 root        155 ki31      0   192K CPU1     1 129.7H  98.68% [idle{idle: cpu1}]
   11 root        155 ki31      0   192K CPU10   10 138.1H  98.33% [idle{idle: cpu10}]
   11 root        155 ki31      0   192K CPU5     5 130.5H  97.51% [idle{idle: cpu5}]
   11 root        155 ki31      0   192K CPU0     0 138.3H  95.78% [idle{idle: cpu0}]
   11 root        155 ki31      0   192K CPU8     8 137.7H  95.25% [idle{idle: cpu8}]
   11 root        155 ki31      0   192K CPU6     6 138.7H  95.20% [idle{idle: cpu6}]
   11 root        155 ki31      0   192K CPU4     4 138.4H  94.26% [idle{idle: cpu4}]
22772 root         82    0    15M  6772K CPU7     7   0:04  93.83% iperf3 -c 192.168.1.31
   11 root        155 ki31      0   192K RUN      7 129.4H  68.75% [idle{idle: cpu7}]
   11 root        155 ki31      0   192K RUN     11 126.8H  46.12% [idle{idle: cpu11}]
    0 root        -76    -      0   848K -        4 277:00   5.12% [kernel{if_io_tqg_4}]
   12 root        -60    -      0   400K WAIT    11 449:21   5.02% [intr{swi4: clock (0)}]
    0 root        -76    -      0   848K -        8 317:40   3.81% [kernel{if_io_tqg_8}]
    0 root        -76    -      0   848K -        0 272:13   2.71% [kernel{if_io_tqg_0}]

I occasionally see flowd_aggregate.py pop up to 100% but it doesn't seem consistent or relevant to when iperf3 is running:

# top -aSH

last pid: 99781;  load averages:  1.15,  0.90,  0.77                                                                                                                                                                      up 5+23:47:27  14:22:57
232 threads:   14 running, 193 sleeping, 25 waiting
CPU:  8.5% user,  0.0% nice,  1.6% system,  0.4% interrupt, 89.5% idle
Mem: 1481M Active, 299M Inact, 1656M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
43465 root         90    0    33M    25M CPU7     7   7:11  99.82% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)
   11 root        155 ki31      0   192K CPU9     9 131.3H  99.80% [idle{idle: cpu9}]
   11 root        155 ki31      0   192K CPU3     3 130.8H  99.68% [idle{idle: cpu3}]
   11 root        155 ki31      0   192K CPU10   10 138.1H  99.50% [idle{idle: cpu10}]
   11 root        155 ki31      0   192K CPU6     6 138.7H  98.53% [idle{idle: cpu6}]
   11 root        155 ki31      0   192K RUN      5 130.5H  98.20% [idle{idle: cpu5}]
   11 root        155 ki31      0   192K CPU1     1 129.7H  97.97% [idle{idle: cpu1}]
   11 root        155 ki31      0   192K CPU11   11 126.8H  96.52% [idle{idle: cpu11}]
   11 root        155 ki31      0   192K CPU0     0 138.3H  96.43% [idle{idle: cpu0}]
   11 root        155 ki31      0   192K CPU8     8 137.7H  95.95% [idle{idle: cpu8}]
   11 root        155 ki31      0   192K CPU2     2 138.3H  95.81% [idle{idle: cpu2}]
   11 root        155 ki31      0   192K CPU4     4 138.4H  93.94% [idle{idle: cpu4}]
   12 root        -60    -      0   400K WAIT     4 449:17   5.10% [intr{swi4: clock (0)}]
    0 root        -76    -      0   848K -        4 276:55   4.95% [kernel{if_io_tqg_4}]


What is going on here?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 26, 2020, 08:17:40 am
To add to this, I re-configured all my VLANs on bge0 (onboard NIC) and moved all my interfaces over to each respective bge0_vlanX interface and re-ran my iperf3 tests.

On my first test, I got the same throughput as with my Intel X520-SR2 NIC:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[  5] local 192.168.1.1 port 42455 connected to 192.168.1.31 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  92.0 MBytes   772 Mbits/sec   91   5.70 KBytes
[  5]   1.00-2.00   sec  91.1 MBytes   764 Mbits/sec   88    145 KBytes
[  5]   2.00-3.00   sec  86.1 MBytes   722 Mbits/sec   86    836 KBytes
[  5]   3.00-4.00   sec  92.5 MBytes   776 Mbits/sec   76    589 KBytes
[  5]   4.00-5.00   sec   107 MBytes   894 Mbits/sec    0    803 KBytes
[  5]   5.00-6.00   sec   107 MBytes   898 Mbits/sec    2    731 KBytes
[  5]   6.00-7.00   sec   109 MBytes   914 Mbits/sec    1    658 KBytes
[  5]   7.00-8.00   sec   110 MBytes   926 Mbits/sec    0    863 KBytes
[  5]   8.00-9.00   sec   107 MBytes   898 Mbits/sec    2    748 KBytes
[  5]   9.00-10.00  sec   109 MBytes   918 Mbits/sec    1    663 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1011 MBytes   848 Mbits/sec  347             sender
[  5]   0.00-10.32  sec  1010 MBytes   821 Mbits/sec                  receiver


For reference, I just tested with my MacBook Pro against the same iperf3 server and was able to push 926 Mbps and re-tested my QNAP to QNAP transfer and it did 9.39 Gbps to completely rule out it's an iperf3 server thing.

For the sake of testing because why not, I re-ran iperf3 from my OPNsense server once more and got near gigabit throughput:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[  5] local 192.168.1.1 port 8283 connected to 192.168.1.31 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec    0    792 KBytes
[  5]   1.00-2.00   sec   111 MBytes   932 Mbits/sec    2    698 KBytes
[  5]   2.00-3.00   sec   111 MBytes   930 Mbits/sec    1    638 KBytes
[  5]   3.00-4.00   sec   108 MBytes   905 Mbits/sec    1    585 KBytes
[  5]   4.00-5.00   sec   111 MBytes   929 Mbits/sec    0    816 KBytes
[  5]   5.00-6.00   sec   111 MBytes   929 Mbits/sec    1    776 KBytes
[  5]   6.00-7.00   sec   111 MBytes   928 Mbits/sec    1    725 KBytes
[  5]   7.00-8.00   sec   108 MBytes   906 Mbits/sec    2    663 KBytes
[  5]   8.00-9.00   sec   111 MBytes   928 Mbits/sec    2    616 KBytes
[  5]   9.00-10.00  sec   111 MBytes   928 Mbits/sec    0    837 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.07 GBytes   922 Mbits/sec   10             sender
[  5]   0.00-10.32  sec  1.07 GBytes   892 Mbits/sec                  receiver


One thing I noticed between the first and second iperf3 test was the "Retr" column of 347 vs 10.  I researched what that meant for iperf3 and found this: "It's the number of TCP segments retransmitted. This can happen if TCP segments are lost in the network due to congestion or corruption."

I also noticed during my second iperf3 test that there was now a kernel process using 99.81% CPU:

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31      0   192K CPU3     3   9:02 100.00% [idle{idle: cpu3}]
    0 root        -92    -      0   848K CPU2     2   0:30  99.81% [kernel{bge0 taskq}]


Additionally, I am not sure "Retr" in itself is a smoking gun as the QNAP to QNAP test that yielded 9.39 Gbps did 2218 retries.

The search continues.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 26, 2020, 08:45:48 am
I know that bge driver has problems with OPNsense but X520 should deliver fine performance.
I tested these cards with 20.7rc1 and got full wire speed.

I can run these tests again with latest 20.7.1 but I need to finish some other stuff first.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 26, 2020, 09:42:16 am
I know that the Broadcom drivers aren't the best but I figured it was worth a test.  That being said, I just swapped the Intel X520-SR2 with a Chelsio T540-CR which seems to have excellent FreeBSD support and that family of NICs seems frequently recommended.

Here's the results from the Chelsio T540-CR:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[  5] local 192.168.1.1 port 19465 connected to 192.168.1.31 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   943 Mbits/sec    0   8.00 MBytes
[  5]   1.00-2.00   sec   110 MBytes   924 Mbits/sec    0   8.00 MBytes
[  5]   2.00-3.00   sec   112 MBytes   939 Mbits/sec    0   8.00 MBytes
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec    0   8.00 MBytes
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec    0   8.00 MBytes
[  5]   5.00-6.00   sec   112 MBytes   939 Mbits/sec    0   8.00 MBytes
[  5]   6.00-7.00   sec   112 MBytes   940 Mbits/sec    0   8.00 MBytes
[  5]   7.00-8.00   sec   112 MBytes   938 Mbits/sec    0   8.00 MBytes
[  5]   8.00-9.00   sec   112 MBytes   940 Mbits/sec    0   8.00 MBytes
[  5]   9.00-10.00  sec   112 MBytes   940 Mbits/sec    0   8.00 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec    0             sender
[  5]   0.00-10.32  sec  1.09 GBytes   909 Mbits/sec                  receiver


Also thought it was interesting there were zero retransmits on the test.

I swapped out the optic on the NIC when I swapped the NIC itself.  I will swap the optic on the switch and maybe try a different switch port and fiber patch cable tomorrow, though, I doubt those are the issue.

Unfortunately, it appears that the issue was not my Intel X520-SR2 NIC as the Chelsio T540-CR exhibits the same behavior.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 27, 2020, 03:41:43 am
Just a status update:

Swapped optics on the switch side (both have now been switched) and swapped for a new fiber patch cable.  Same results.  I also re-enabled "Hardware CRC" and "VLAN Hardware Filtering" but left "Hardware TSO" and "Hardware LRO" disabled as I read most drivers are broken for those functions.

I also added this to /boot/loader.conf.local and rebooted:

hw.cxgbe.toecaps_allowed=0
hw.cxgbe.rdmacaps_allowed=0
hw.cxgbe.iscsicaps_allowed=0
hw.cxgbe.fcoecaps_allowed=0


Absolutely zero impact in performance.  Tomorrow I think I'll unbox my other PowerEdge R430 and put the original Intel X520-SR2 NIC in it and see if I can duplicate the problem.

I am at a total loss of what is going on here.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 28, 2020, 04:08:35 am
OK so at the risk of seeming like I am only talking to myself at this point, I think I found a commonality amongst the poor performance -- it's OPNsense.

I built a fresh new and updated OPNsense 20.7.1 VM on VMware ESXi 6.7U3, imported my configuration backup from my physical server and re-mapped all the interfaces to the new vmx0_vlanX names and things are working, albeit even slower than the physical hardware:

root@opnsense1:~ # iperf3 -c 192.168.1.31
...
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   705 MBytes   591 Mbits/sec    0             sender
[  5]   0.00-10.41  sec   705 MBytes   568 Mbits/sec                  receiver


Seems pretty awful.  So I decided to create a two new OPNsense 20.7.1 VMs and configure one as a VLAN trunk and the other as non-trunk to test if the problem lied within the VLAN implementation itself:

OPNsense 20.7.1 (amd64)

VLAN and pf Enabled:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   949 MBytes   796 Mbits/sec    0             sender
[  5]   0.00-10.40  sec   949 MBytes   766 Mbits/sec                  receiver


VLAN and pf Disabled (pfctl -d):
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec  1.22 GBytes  1.05 Gbits/sec    0             sender
[  5]   0.00-10.41  sec  1.22 GBytes  1.01 Gbits/sec                  receiver


Non-VLAN and pf Enabled:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   854 MBytes   716 Mbits/sec    0             sender
[  5]   0.00-10.40  sec   854 MBytes   688 Mbits/sec                  receiver


Non-VLAN and pf Disabled (pfctl -d):
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   983 MBytes   825 Mbits/sec    0             sender
[  5]   0.00-10.40  sec   983 MBytes   793 Mbits/sec                  receiver


As you can see, the VLAN trunk configured VM had slightly better performance.  Perhaps environmental impacts caused the performance differences as I would expect them to be nearly the same.  Even at the differences I'm seeing, I would consider it mostly negligible given the link is 10 gigabit.  I also tested without pf to see if the throughput was measurable.  Both tests show that it is in fact better without pf, though, kinda pointless to have a network perimeter firewall without it running...

Next I thought maybe this is just a fluke and all three OPNsense servers just suck on VMware ESXi and dislike the hardware or configuration or maybe my ESX host just can't push traffic.  I had a CentOS 8.2.2004 VM already deployed and configured on the same network segment I had been testing on so I loaded up iperf3 on it to see if it was an ESX host/network problem.

CentOS 8.2.2004 (x86_64)

Non-VLAN and firewalld Enabled:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.7 GBytes  9.17 Gbits/sec   11             sender
[  5]   0.00-10.04  sec  10.7 GBytes  9.14 Gbits/sec                  receiver

Non-VLAN and firewalld Disabled:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.8 GBytes  9.32 Gbits/sec    1             sender
[  5]   0.00-10.04  sec  10.8 GBytes  9.28 Gbits/sec                  receiver


Tested with firewall on and off just for fun to see how much iptables slowed the Linux test down.  As you can see, 9.14 Gbps to 9.32 Gbps on this test.  The problem isn't my ESX host or my network.

I then thought it might be a BSD problem.  Perhaps something with running inside VMware or the vmxnet3 driver that is problematic.  I tried to figure out how to install HardenedBSD but it seemed too difficult difficult as my quick search for an ISO yielded not much.  As such, I used FreeBSD.  Hopefully it's close enough!

FreeBSD 12.1 (amd64)

VLAN and pf Disabled (not configured):
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.35 Gbits/sec    0             sender
[  5]   0.00-10.42  sec  10.9 GBytes  8.97 Gbits/sec                  receiver


Non-VLAN and pf Disabled (not configured):
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.36 Gbits/sec   13             sender
[  5]   0.00-10.21  sec  10.9 GBytes  9.17 Gbits/sec                  receiver


I thought I hadn't spent enough time already dorking around with this so why not configure one test VM to be VLAN trunking and the other not to see if there are any differences.  As you can see, FreeBSD 12.1 pushed the packets, fast, regardless of VLAN or otherwise.  Problem doesn't seem to be vmxnet3/ESXi and FreeBSD related.

Finally, I came to the conclusion that maybe OPNsense 20.7 is just broken.  As such, I loaded up a OPNsense 19.7 test VM and gave it a go.

OPNsense 19.7.10_1 (amd64)

Non-VLAN and pf Enabled:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.75 GBytes  1.50 Gbits/sec    0             sender
[  5]   0.00-10.44  sec  1.75 GBytes  1.44 Gbits/sec                  receiver

Non-VLAN and pf Disabled (pfctl -d):
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.57 GBytes  2.21 Gbits/sec    0             sender
[  5]   0.00-10.48  sec  2.57 GBytes  2.11 Gbits/sec                  receiver


Not good.  You can see the results of 1.75 Gbps to 2.57 Gbps is measurably better than my test results with OPNsense 20.7 but nowhere near stellar.  I was very much over testing at this point so I opted not to do a VLAN versus non-VLAN configuration.  That being said, based on historical results, I am sure that the difference in results would have been negligible.

To add to this, as a general observation, whenever the iperf3 test is running on OPNsense, a constant ping of the firewall starts to drop packets like it is choked out and cannot keep up.  I did not experience this at all on CentOS or FreeBSD when testing.

Why is OPNsense so bad at throughput in my tests?  If it's not, what am I doing wrong?  The commonality amongst these tests seems to be OPNsense, regardless if it's 19.7 or 20.7, though, the former is better than the later.

Edit:  Because why not at this point.  Let's test pfSense!

pfSense 2.4.5 (amd64)

Non-VLAN and pf Enabled:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.80 GBytes  3.26 Gbits/sec   67             sender
[  5]   0.00-10.26  sec  3.80 GBytes  3.18 Gbits/sec                  receiver


Non-VLAN and pf Disabled (pfctl -d):
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.66 GBytes  4.86 Gbits/sec  109             sender
[  5]   0.00-10.22  sec  5.66 GBytes  4.76 Gbits/sec                  receiver


pfSense is not stellar, especially considering it is based on FreeBSD 12.1 and I tested FreeBSD 12.1 and got very different (better) results.  That being said, both results are much, much faster than any OPNsense test I could push regardless if physical or virtual.

Edit 2:  Fixed a typo in my comments where I erroneously used 20.1 instead of 20.7 when referring to editions of OPNsense.

TL;DR:  OPNsense seems to be dog slow compared to FreeBSD 12.1 and CentOS 8.2 at raw network throughput.  What gives?  What am I doing wrong that it can be this huge of a performance gap?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: biomatrix on August 28, 2020, 04:39:55 am
your testing is amazing -
I have nothing to add (there are actually 2 other threads with this same subject matter - various reason, but we're slow)

I am posting to let you know, there are others and you arn't just talking to yourself
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 28, 2020, 07:06:03 am
  What am I doing wrong that it can be this huge of a performance gap?

The problem is, your are not testing traffic *through* the firewall, you are measuring *against* the firewall.
iperf3 on OPNsense operates really bad. Can you test sender and receiver on different interfaces?

Again, I'm doing regular performance tests with hardware details and I'm always near wirespeed:
https://www.routerperformance.net/opnsense/opnsense-performance-20-1-8/
https://www.routerperformance.net/routers/nexcom-nsa/fujitsu-rx1330/
https://www.routerperformance.net/routers/nexcom-nsa/thomas-krenn-ri1102d/
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 28, 2020, 07:23:32 am
OK, I upgraded my lab now:

Client1: Ubuntu
FW1: 20.7.1 (Intel(R) Xeon(R) CPU E3-1240 v6 @ 3.70GHz (8 cores))
FW2: 20.7
Client2: Ubuntu

They are directly attached via TwinAx cables and a mix of Intel X520 and Mellanoc Connect-X3.

Client1 is iperf client, Client2 is iperf server:


With IPS enabled, 1 stream:

root@px3:~# iperf3 -p 5000 -f m -V -c 10.2.0.10  -P 1 -t 10 -R
iperf 3.1.3
Linux px3 4.15.18-12-pve #1 SMP PVE 4.15.18-35 (Wed, 13 Mar 2019 08:24:42 +0100) x86_64
Time: Fri, 28 Aug 2020 05:17:13 GMT
Connecting to host 10.2.0.10, port 5000
Reverse mode, remote host 10.2.0.10 is sending
      Cookie: px3.1598591833.837625.6814fda03553a5
      TCP MSS: 1448 (default)
[  4] local 10.1.0.10 port 58842 connected to 10.2.0.10 port 5000
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   159 MBytes  1335 Mbits/sec
[  4]   1.00-2.00   sec   159 MBytes  1335 Mbits/sec
[  4]   2.00-3.00   sec   156 MBytes  1308 Mbits/sec
[  4]   3.00-4.00   sec   156 MBytes  1305 Mbits/sec
[  4]   4.00-5.00   sec   157 MBytes  1313 Mbits/sec
[  4]   5.00-6.00   sec   157 MBytes  1315 Mbits/sec
[  4]   6.00-7.00   sec   156 MBytes  1309 Mbits/sec
[  4]   7.00-8.00   sec   157 MBytes  1319 Mbits/sec
[  4]   8.00-9.00   sec   155 MBytes  1298 Mbits/sec
[  4]   9.00-10.00  sec   155 MBytes  1301 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.53 GBytes  1316 Mbits/sec   39             sender
[  4]   0.00-10.00  sec  1.53 GBytes  1315 Mbits/sec                  receiver
CPU Utilization: local/receiver 63.0% (8.2%u/54.8%s), remote/sender 0.2% (0.0%u/0.2%s)

iperf Done.


Without IPS, 1 stream:

root@px3:~# iperf3 -p 5000 -f m -V -c 10.2.0.10  -P 1 -t 10 -R
iperf 3.1.3
Linux px3 4.15.18-12-pve #1 SMP PVE 4.15.18-35 (Wed, 13 Mar 2019 08:24:42 +0100) x86_64
Time: Fri, 28 Aug 2020 05:18:46 GMT
Connecting to host 10.2.0.10, port 5000
Reverse mode, remote host 10.2.0.10 is sending
      Cookie: px3.1598591926.454562.6f7931ec23f094
      TCP MSS: 1448 (default)
[  4] local 10.1.0.10 port 58846 connected to 10.2.0.10 port 5000
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   800 MBytes  6708 Mbits/sec
[  4]   1.00-2.00   sec   816 MBytes  6844 Mbits/sec
[  4]   2.00-3.00   sec   814 MBytes  6830 Mbits/sec
[  4]   3.00-4.00   sec   814 MBytes  6829 Mbits/sec
[  4]   4.00-5.00   sec   816 MBytes  6844 Mbits/sec
[  4]   5.00-6.00   sec   816 MBytes  6844 Mbits/sec
[  4]   6.00-7.00   sec   815 MBytes  6840 Mbits/sec
[  4]   7.00-8.00   sec   816 MBytes  6840 Mbits/sec
[  4]   8.00-9.00   sec   815 MBytes  6841 Mbits/sec
[  4]   9.00-10.00  sec   816 MBytes  6841 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  7.95 GBytes  6829 Mbits/sec   36             sender
[  4]   0.00-10.00  sec  7.95 GBytes  6826 Mbits/sec                  receiver
CPU Utilization: local/receiver 28.7% (1.2%u/27.5%s), remote/sender 1.2% (0.0%u/1.2%s)

iperf Done.


Without IPS, 10 parallel streams:

[  4]   3.00-3.90   sec   106 MBytes   992 Mbits/sec
[  6]   3.00-3.90   sec   105 MBytes   981 Mbits/sec
[  8]   3.00-3.90   sec  71.7 MBytes   669 Mbits/sec
[ 10]   3.00-3.90   sec  69.8 MBytes   651 Mbits/sec
[ 12]   3.00-3.90   sec  73.6 MBytes   686 Mbits/sec
[ 14]   3.00-3.90   sec  97.8 MBytes   912 Mbits/sec
[ 16]   3.00-3.90   sec   101 MBytes   941 Mbits/sec
[ 18]   3.00-3.90   sec  80.4 MBytes   750 Mbits/sec
[ 20]   3.00-3.90   sec   137 MBytes  1279 Mbits/sec
[ 22]   3.00-3.90   sec   163 MBytes  1523 Mbits/sec
[SUM]   3.00-3.90   sec  1006 MBytes  9383 Mbits/sec
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 28, 2020, 07:45:31 am
I mean of course running a parallel test is going to yield better results if the firewall has multi-core CPU(s) and you are maxing out a CPU core.

The issue I have is that that single threaded throughput is only about 850 Mbps on my non-virtualized hardware.  That seems not right to me but I only know my environment so I might just be wrong.

And yes, I did test through the firewall before I started doing tests from the firewall.  Through the firewall nets me similar performance for single threaded:

[root@client1 ~]# iperf3 -f m -c 192.168.1.31
...
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   973 MBytes   816 Mbits/sec   22             sender
[  4]   0.00-10.00  sec   970 MBytes   814 Mbits/sec                  receiver


And, as expected, increased throughput when running in parallel:

[root@client1 ~]# iperf3 -f m -c 192.168.1.31 -P 10
...
[ ID] Interval           Transfer     Bandwidth       Retr
...
[SUM]   0.00-10.00  sec  3.26 GBytes  2798 Mbits/sec  4464             sender
[SUM]   0.00-10.00  sec  3.23 GBytes  2776 Mbits/sec                  receiver


Can you humor me and run a single threaded test through your hardware and show me the output?

If OPNsense is truly not broken in this release then I guess my CPU core speed isn't enough to achieve what I am looking to do and I need to look on eBay for a faster one.  That being said, it appears there are several others reporting degraded performance since upgrading so maybe there is something to my claim.

Edit:  I see your single threaded non-IPS throughput is 6826 Mbps.  See, even your single threaded test absolutely crushes mine.  I get that your CPU is @ 3.7 GHz and a v6 but really, almost 7 Gbps versus less than my 1 Gbps.  I have a v3 Xeon that has a higher clock rate (maybe 3.2 GHz?) I can try to test out tomorrow to see what results I get.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 28, 2020, 08:16:33 am
Can you also test with pfsense 2.5.0dev since this is based on 12, as 2.4.5 runs FreeBSD 11
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 28, 2020, 09:08:12 am
Fresh install of OPNsense 20.7 on a Dell T20 (Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (4 cores)):

[root@client1 ~]# iperf3 -c 192.168.1.31
...
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  8.29 GBytes  7.12 Gbits/sec    2             sender
[  4]   0.00-10.00  sec  8.29 GBytes  7.12 Gbits/sec                  receiver


[root@client1 ~]# iperf3 -c 192.168.1.31 -P 10
...
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.01  sec  8.77 GBytes  7.53 Gbits/sec  139             sender
[SUM]   0.00-10.01  sec  8.77 GBytes  7.53 Gbits/sec                  receiver


It's just hard to believe that E3-1225 v3 @ 2.4GHz/3.2GHz versus an E5-2620 v3 3.2GHz/3.6GHz is that much difference for a single thread test; however, it's clear, the results don't lie.  There's either something wrong with my hardware, my install or it's just too slow of a CPU to push single threaded performance past about 850 Mbps.

And you're right about the pfSense version of FreeBSD.  I just double checked the page (https://docs.netgate.com/pfsense/en/latest/releases/versions-of-pfsense-and-freebsd.html) and, in spite of it being clearly marked 2.5.0 TBD, I didn't even pay attention that it definitely was not the edition I installed.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 28, 2020, 12:36:42 pm
FreeBSD is known to be not so performant compared to Linux in single stream, esp with pppoe, but your problem is weird. Sadly No other hardware to test
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on August 31, 2020, 08:17:06 am
I built a new OPNsense server on my spare Dell PowerEdge R430 server that has the same CPU in it as my one I am currently using.

I can confirm that the problem appears to be my CPU and/or hardware since the same exact NIC was moved from the Dell PowerEdge T20 which previously tested out at 7.53 Gbps to this R430 server and the test results are much lower:

[root@client1 ~]# iperf3 -c 192.168.1.31
...
[  4]   0.00-10.00  sec  2.13 GBytes  1.83 Gbits/sec   72             sender
[  4]   0.00-10.00  sec  2.13 GBytes  1.83 Gbits/sec                  receiver

[root@client1 ~]# iperf3 -c 192.168.1.31 -P 10
...
[SUM]   0.00-10.00  sec  4.78 GBytes  4.10 Gbits/sec  1143             sender
[SUM]   0.00-10.00  sec  4.75 GBytes  4.08 Gbits/sec                  receiver


One observation is on like-for-like hardware, the new R430 is performing more than double the throughput on the single thread test and more than a gigabit more on parallel test than my currently used R430 I have been experiencing problems with.  No idea why this is.

I guess I have a decision to make about buying a new CPU or a new server.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 02, 2020, 07:34:01 am
OK, back to basics here.  I couldn't leave well enough alone and I did more testing tonight because I just couldn't believe that my CPU couldn't even do single threaded gigabit.  Here's my test scenario:

Test Scenario 1:

Single Threaded:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.00 GBytes   863 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.00 GBytes   860 Mbits/sec                  receiver


6 Parallel Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  2.23 GBytes  1.91 Gbits/sec  938             sender
[SUM]   0.00-10.00  sec  2.22 GBytes  1.90 Gbits/sec                  receiver


Notice a common theme here with the ~850 Mbps single threaded test.  It's pretty close to what I get with OPNsense.  Note this is THROUGH the firewall and not from the firewall.  Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

Test Scenario 2:

Single Threaded:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  9.75 GBytes  8.38 Gbits/sec  573             sender
[  4]   0.00-10.00  sec  9.75 GBytes  8.38 Gbits/sec                  receiver


6 Parallel Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  10.5 GBytes  9.05 Gbits/sec  3607             sender
[SUM]   0.00-10.00  sec  10.5 GBytes  9.04 Gbits/sec                  receiver


I couldn't believe my eyes as I had to do a triple check that it was in fact pushing 8.38 Gbps THROUGH the FreeBSD 12.1 server and it wasn't taking some magical alternate path somehow.  It was, in fact, going through the FreeBSD router.  As you can see, parallel test is about 1 Gbps less than wire speed.  Excellent!  Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

I thought I would enable pfctl on the FreeBSD 12.1 router to see how that affected performance.  Not sure how much adding rules impacts throughput but I did notice a measurable drop in the single thread test (6.23 Gbps) but the parallel thread test was negligible (8.94 Gbps).

As of right now, it seems so so so strange to me that HardenedBSD exhibits the same exact single threaded throughput and likewise low parallel thread throughput over FreeBSD.

I am willing to accept that I am not accounting for something here; however, near wire speed throughput on the same exact hardware on FreeBSD versus HardenedBSD, it seems to me something is very different with HardenedBSD.

What are your thoughts?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on September 02, 2020, 09:04:55 am
I am seeing very slow throughput on pfsense as well using Iperf3 online.

Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
16 CPUs: 2 package(s) x 8 core(s)
AES-NI CPU Crypto: Yes (inactive)

Using Suricata and cant get more then 200 mbps... pretty annoying.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 02, 2020, 10:53:01 am
Ok, so we have an upstream problem with FreeBSD and a some chances to get them fixed the next months.
So the interim solution for now is to go

a) go back to 20.1
b) disable netmap (IPS/Sensei)
c) accept the lowered performance

I had a talk to Franco yesterday, there are some promising patches awaiting and we sure need some testers, so if one not going back to 20.1, this would be fine
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on September 02, 2020, 10:58:28 am
Wasnt the problem OPN/pfsense instead of FreeBSD? Didnt the 10gbit tests show wirespeed on a FreeBSD machine using pf?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 02, 2020, 11:00:14 am
No, OPNsense 20.7 and pfSense 2.5 are using FreeBSD 12.X; 20.1 and pf 2.4 FreeBSD 11.X

With FreeBSD12 interface/networking stack was changed to iflib, which has known problems with netmap, where ppl. are already working on it.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on September 02, 2020, 11:12:45 am
@minimugmail

OK, back to basics here.  I couldn't leave well enough alone and I did more testing tonight because I just couldn't believe that my CPU couldn't even do single threaded gigabit.  Here's my test scenario:

Test Scenario 1:
  • Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
  • Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
  • Dell PowerEdge R430 w/Intel X520-SR2 and HardenedBSD 12-STABLE (BUILD-LATEST 2020-08-31)

Single Threaded:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.00 GBytes   863 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.00 GBytes   860 Mbits/sec                  receiver


6 Parallel Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  2.23 GBytes  1.91 Gbits/sec  938             sender
[SUM]   0.00-10.00  sec  2.22 GBytes  1.90 Gbits/sec                  receiver


Notice a common theme here with the ~850 Mbps single threaded test.  It's pretty close to what I get with OPNsense.  Note this is THROUGH the firewall and not from the firewall.  Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

Test Scenario 2:
  • Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
  • Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
  • Dell PowerEdge R430 w/Intel X520-SR2 and FreeBSD 12.1-RELEASE

Single Threaded:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  9.75 GBytes  8.38 Gbits/sec  573             sender
[  4]   0.00-10.00  sec  9.75 GBytes  8.38 Gbits/sec                  receiver


6 Parallel Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  10.5 GBytes  9.05 Gbits/sec  3607             sender
[SUM]   0.00-10.00  sec  10.5 GBytes  9.04 Gbits/sec                  receiver


I couldn't believe my eyes as I had to do a triple check that it was in fact pushing 8.38 Gbps THROUGH the FreeBSD 12.1 server and it wasn't taking some magical alternate path somehow.  It was, in fact, going through the FreeBSD router.  As you can see, parallel test is about 1 Gbps less than wire speed.  Excellent!  Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

I thought I would enable pfctl on the FreeBSD 12.1 router to see how that affected performance.  Not sure how much adding rules impacts throughput but I did notice a measurable drop in the single thread test (6.23 Gbps) but the parallel thread test was negligible (8.94 Gbps).

As of right now, it seems so so so strange to me that HardenedBSD exhibits the same exact single threaded throughput and likewise low parallel thread throughput over FreeBSD.

I am willing to accept that I am not accounting for something here; however, near wire speed throughput on the same exact hardware on FreeBSD versus HardenedBSD, it seems to me something is very different with HardenedBSD.

What are your thoughts?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: franco on September 02, 2020, 11:57:56 am
@hax0rwax0r

Try to repeat the FreeBSD 12.1-RELEASE test with our kernel instead of the stock one. I don't expect any differences.

https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/sets/kernel-20.7.2-amd64.txz


Cheers,
Franco
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AdSchellevis on September 02, 2020, 12:08:46 pm
Details matter (a lot) in these cases, we haven't seen huge differences on our end (apart from netmap issues with certain cards, which we don't ship ourselves). That being said, IPS is a feature that really stresses your hardware, quite some setups are not able to do more than 200Mbps mentioned in this thread.

Please be advised that HardenedBSD 12-STABLE isn't the same as OPNsense 20.7, the differences between OPNsense 20.7 src and freebsd are a bit smaller, but if you're convinced your issues lies with HardenedBSD's additions it might be good starting point (and a plain install has less features enabled).

You can always try to install our kernel on the same FreeBSD install which worked without issues (as Franco suggested), it could help reproducing steps more easily.

If you want to compare between HBSD and FBSD anyway, always make sure your comparing apples with apples, check interface settings, build options and tunables (sysctl -a). Testing between interfaces (not vlan's on the same) is probably easier so you know for sure traffic is only flowing once through the physical interface.

In case someone would like to reproduce your test, make sure to document step by step how one could do that (including network segments used).

Best regards,

Ad
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 02, 2020, 12:40:01 pm
@minimugmail

OK, back to basics here.  I couldn't leave well enough alone and I did more testing tonight because I just couldn't believe that my CPU couldn't even do single threaded gigabit.  Here's my test scenario:

Test Scenario 1:
  • Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
  • Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
  • Dell PowerEdge R430 w/Intel X520-SR2 and HardenedBSD 12-STABLE (BUILD-LATEST 2020-08-31)

Single Threaded:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.00 GBytes   863 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.00 GBytes   860 Mbits/sec                  receiver


6 Parallel Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  2.23 GBytes  1.91 Gbits/sec  938             sender
[SUM]   0.00-10.00  sec  2.22 GBytes  1.90 Gbits/sec                  receiver


Notice a common theme here with the ~850 Mbps single threaded test.  It's pretty close to what I get with OPNsense.  Note this is THROUGH the firewall and not from the firewall.  Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

Test Scenario 2:
  • Physical Linux Server (CentOS 7) on VLAN 2 (iperf3 client)
  • Virtual Linux Server (CentOS 7) on VLAN 24 (iperf3 server)
  • Dell PowerEdge R430 w/Intel X520-SR2 and FreeBSD 12.1-RELEASE

Single Threaded:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  9.75 GBytes  8.38 Gbits/sec  573             sender
[  4]   0.00-10.00  sec  9.75 GBytes  8.38 Gbits/sec                  receiver


6 Parallel Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  10.5 GBytes  9.05 Gbits/sec  3607             sender
[SUM]   0.00-10.00  sec  10.5 GBytes  9.04 Gbits/sec                  receiver


I couldn't believe my eyes as I had to do a triple check that it was in fact pushing 8.38 Gbps THROUGH the FreeBSD 12.1 server and it wasn't taking some magical alternate path somehow.  It was, in fact, going through the FreeBSD router.  As you can see, parallel test is about 1 Gbps less than wire speed.  Excellent!  Also note my system did have IPv6 addresses from my ISP on each of the interfaces, though, I was only testing IPv4 traffic.

I thought I would enable pfctl on the FreeBSD 12.1 router to see how that affected performance.  Not sure how much adding rules impacts throughput but I did notice a measurable drop in the single thread test (6.23 Gbps) but the parallel thread test was negligible (8.94 Gbps).

As of right now, it seems so so so strange to me that HardenedBSD exhibits the same exact single threaded throughput and likewise low parallel thread throughput over FreeBSD.

I am willing to accept that I am not accounting for something here; however, near wire speed throughput on the same exact hardware on FreeBSD versus HardenedBSD, it seems to me something is very different with HardenedBSD.

What are your thoughts?

I have the same values with 20.7 on SuperMicro Hardware with Xeon and X520 as posted before. It's something in your hardware
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 02, 2020, 05:12:22 pm
I am not super familiar with FreeBSD so how would I go about swapping your kernel in for the existing stock FreeBSD 12.1 one I am running?  I searched around on Google and I found how to build a customer kernel from source but this txz file you linked appears to be already compiled so I don't think that's what I want to do.

I also found reference to pkg-static to install locally downloaded packages but wanted to get some initial guidance before totally hosing this up.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 02, 2020, 05:21:00 pm
This should also be the same kernel which gets installed with latest 20.7.2
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 02, 2020, 07:25:58 pm
Oh, I guess I misunderstood franco's instructions I thought they were asking me to drop the 20.7.2 kernel linked on top/in place on my FreeBSD 12.1 install which I was asking how exactly to do that.

I think with your clarification and re-reading the post, franco was just asking me to try an install of 20.7.2, which happens to be running that kernel, and re-run my tests to see if it improves.

If that's the case, I will try and report back my findings with OPNsense 20.7.2.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: franco on September 02, 2020, 08:37:59 pm
No I did mean FreeBSD 12.1 with our kernel. All the networking is in the kernel so we will see if this is OPNsense vs. HBSD vs. FBSD or some sort of tweaking effort.

# fetch https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/sets/kernel-20.7.2-amd64.txz
# mv /boot/kernel /boot/kernel.old
# tar -C / -xf kernel-20.7.2-amd64.txz
# kldxref /boot/kernel

It should have a new /boot/kernel now and a reboot should activate it. You can compare build info after the system is back up.

 # uname -rv
12.1-RELEASE-p8-HBSD FreeBSD 12.1-RELEASE-p8-HBSD #0  b3665671c4d(stable/20.7)-dirty: Thu Aug 27 05:58:53 CEST 2020     root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP


Cheers,
Franco
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 02, 2020, 10:40:50 pm
OK here are the test results as you requested:

FreeBSD 12.1 (pf enabled):

[root@fbsd1 ~]# uname -rv
12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC

[root@fbsd1 ~]# top -aSH
last pid:  2954;  load averages:  0.44,  0.42,  0.41                                                                      up 0+01:38:55  20:13:46
132 threads:   10 running, 104 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 19.7% system,  5.2% interrupt, 75.1% idle
Mem: 10M Active, 6100K Inact, 271M Wired, 21M Buf, 39G Free
Swap: 3968M Total, 3968M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31      0    96K RUN      5  94:58  95.25% [idle{idle: cpu5}]
   11 root        155 ki31      0    96K CPU1     1  93:26  83.69% [idle{idle: cpu1}]
   11 root        155 ki31      0    96K RUN      0  94:44  73.68% [idle{idle: cpu0}]
   11 root        155 ki31      0    96K CPU4     4  93:15  72.51% [idle{idle: cpu4}]
   11 root        155 ki31      0    96K CPU3     3  93:36  64.80% [idle{idle: cpu3}]
   11 root        155 ki31      0    96K RUN      2  92:55  62.29% [idle{idle: cpu2}]
    0 root        -76    -      0   480K CPU2     2   0:05  34.76% [kernel{if_io_tqg_2}]
    0 root        -76    -      0   480K CPU3     3   0:14  33.49% [kernel{if_io_tqg_3}]
   12 root        -52    -      0   304K CPU0     0  26:23  29.62% [intr{swi6: task queue}]
    0 root        -76    -      0   480K -        4   0:05  23.31% [kernel{if_io_tqg_4}]
    0 root        -76    -      0   480K -        0   0:05  12.31% [kernel{if_io_tqg_0}]
    0 root        -76    -      0   480K -        1   0:04  10.01% [kernel{if_io_tqg_1}]
   12 root        -88    -      0   304K WAIT     5   3:55   2.28% [intr{irq264: mfi0}]
    0 root        -76    -      0   480K -        5   0:06   1.88% [kernel{if_io_tqg_5}]
 2954 root         20    0    13M  3676K CPU5     5   0:00   0.02% top -aSH
   12 root        -60    -      0   304K WAIT     0   0:01   0.01% [intr{swi4: clock (0)}]
    0 root        -76    -      0   480K -        4   0:02   0.01% [kernel{if_config_tqg_0}]


Single Thread:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  8.45 GBytes  7.26 Gbits/sec  802             sender
[  4]   0.00-10.00  sec  8.45 GBytes  7.26 Gbits/sec                  receiver


10 Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  9.85 GBytes  8.46 Gbits/sec  2991             sender
[SUM]   0.00-10.00  sec  9.83 GBytes  8.45 Gbits/sec                  receiver



FreeBSD 12.1 with OPNsense Kernel (pf enabled):

[root@fbsd1 ~]# uname -rv
12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC

[root@fbsd1 ~]# fetch https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/sets/kernel-20.7.2-amd64.txz
[root@fbsd1 ~]# mv /boot/kernel /boot/kernel.old
[root@fbsd1 ~]# tar -C / -xf kernel-20.7.2-amd64.txz
[root@fbsd1 ~]# kldxref /boot/kernel
[root@fbsd1 ~]# reboot

[root@fbsd1 ~]# uname -rv
12.1-RELEASE-p8-HBSD FreeBSD 12.1-RELEASE-p8-HBSD #0  b3665671c4d(stable/20.7)-dirty: Thu Aug 27 05:58:53 CEST 2020     root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP

[root@fbsd1 ~]# top -aSH
last pid: 43891;  load averages:  0.99,  0.49,  0.20                                                                      up 0+00:04:28  20:29:24
131 threads:   13 running, 100 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 62.5% system,  3.5% interrupt, 33.9% idle
Mem: 14M Active, 1184K Inact, 270M Wired, 21M Buf, 39G Free
Swap: 3968M Total, 3968M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root        -76    -      0   480K CPU3     3   0:08  81.27% [kernel{if_io_tqg_3}]
    0 root        -76    -      0   480K CPU1     1   0:09  74.39% [kernel{if_io_tqg_1}]
    0 root        -76    -      0   480K CPU5     5   0:08  73.20% [kernel{if_io_tqg_5}]
    0 root        -76    -      0   480K CPU0     0   0:21  71.79% [kernel{if_io_tqg_0}]
   11 root        155 ki31      0    96K RUN      4   4:09  54.15% [idle{idle: cpu4}]
   11 root        155 ki31      0    96K RUN      2   4:09  51.30% [idle{idle: cpu2}]
    0 root        -76    -      0   480K CPU2     2   0:05  40.10% [kernel{if_io_tqg_2}]
    0 root        -76    -      0   480K -        4   0:09  37.60% [kernel{if_io_tqg_4}]
   11 root        155 ki31      0    96K RUN      0   4:03  26.48% [idle{idle: cpu0}]
   11 root        155 ki31      0    96K RUN      5   4:14  25.87% [idle{idle: cpu5}]
   11 root        155 ki31      0    96K RUN      1   4:09  24.32% [idle{idle: cpu1}]
   12 root        -52    -      0   304K RUN      2   1:12  20.63% [intr{swi6: task queue}]
   11 root        155 ki31      0    96K CPU3     3   4:00  17.30% [idle{idle: cpu3}]
   12 root        -88    -      0   304K WAIT     5   0:10   1.47% [intr{irq264: mfi0}]
43891 root         20    0    13M  3660K CPU4     4   0:00   0.03% top -aSH
   21 root        -16    -      0    16K -        4   0:00   0.02% [rand_harvestq]
   12 root        -60    -      0   304K WAIT     1   0:00   0.02% [intr{swi4: clock (0)}]


Single Thread:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec                  receiver


10 Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  8.16 GBytes  7.01 Gbits/sec  4260             sender
[SUM]   0.00-10.00  sec  8.13 GBytes  6.98 Gbits/sec                  receiver


I included the "top -aSH" output again because my general observation between OPNsense kernel and FreeBSD 12.1 stock kernel is the "[kernel{if_io_tqg_X}]" process usage.  Even on an actual OPNsense 20.7.2 installation I notice the exact same behavior of the "[kernel{if_io_tqg_X}]" being consistently higher and throughput significantly slower, specifically on single threaded tests.  Note that both of the top outputs were only from the 10 thread count tests only as I did not think to capture them during the single threaded test.

I can't help but think that whatever high "[kernel{if_io_tqg_X}]" on the OPNsense kernel means is starving the system of throughput potential.

Thoughts?  Next steps I can run and provide results from?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on September 02, 2020, 11:37:08 pm
Just wanted to post here due to the excellent testing from OP and to corroborate the numbers that OP is seeing.

My testing setup is as follows:
ESXi 6.7u3, host has an E3 1220v3 and 32GB of RAM
All Firewall VMs have 2vCPU. 5GB of RAM allocated to OPNsense.
VMXnet3 NICs negotiated at 10gbps

In pfSense and OPNsense, I disabled all of the hardware offloading features. I am using client and server VMs on the WAN and LAN sides of the firewall VMs. This means I am pushing/pulling traffic through the firewalls, I am not running iperf directly on any of the firewalls themselves. Because I am doing this on a single ESXi host and the traffic is within the same host/vSwitch, the traffic is never routed to my physical network switch and therefore I can test higher throughput.

pfSense and OPNsense were both out of the box installs with their default rulesets. I did not add any packages or make any config changes outside of making sure that all hardware offloading was disabled. All iperf3 tests were run with the LAN side client pulling traffic through the WAN side interface, to simulate a large download. However, if I perform upload tests, my throughput results are the same. All iperf3 tests were run for 60 seconds and used the default MTU of 1500. The results below show the average of the 60 second runs. I ran each test twice, and used the final result to allow the firewalls to "warm up" and stabilize with their throughput during testing.

Code: [Select]
pfSense 2.4.5p1 1500MTU receiving from WAN, vmx3 NICs, all hardware offloading disabled, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  31.5 GBytes  4.50 Gbits/sec  11715             sender
[  5]   0.00-60.00  sec  31.5 GBytes  4.50 Gbits/sec                  receiver

OpenWRT 19.07.3 1500MTU receiving from WAN, vmx3 NICs, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  47.5 GBytes  6.81 Gbits/sec  44252             sender
[  5]   0.00-60.00  sec  47.5 GBytes  6.81 Gbits/sec                  receiver

OPNsense 20.7.2 1500MTU receiving from WAN, vmx3 NICs, all hardware offloading disabled, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  6.83 GBytes   977 Mbits/sec  459             sender
[  5]   0.00-60.00  sec  6.82 GBytes   977 Mbits/sec                  receiver

I also notice that while doing a throughput test on OPNsense, one of the vCPUs is completely consumed. I did not see this behavior with Linux or pfSense on my testing, screenshot attached shows the CPU usage I'm seeing while the iperf3 test is running.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: willysaef18 on September 03, 2020, 06:00:35 am
Hi, Newbie here. 

I also notice this problem with OpnSense v 20.7.2 which was released recently. I got only about 450 Mbps in my LAN, when no one uses it besides me (I disconnect every downlink devices). I use iPerf3 on Windows to check it out.

Code: [Select]
PS E:\Util> .\iperf3.exe -c 192.168.10.8 -p 26574
Connecting to host 192.168.10.8, port 26574
[  4] local 192.168.12.4 port 50173 connected to 192.168.10.8 port 26574
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  49.1 MBytes   412 Mbits/sec
[  4]   1.00-2.00   sec  52.5 MBytes   440 Mbits/sec
[  4]   2.00-3.00   sec  51.8 MBytes   434 Mbits/sec
[  4]   3.00-4.00   sec  52.4 MBytes   439 Mbits/sec
[  4]   4.00-5.00   sec  52.1 MBytes   438 Mbits/sec
[  4]   5.00-6.00   sec  52.6 MBytes   441 Mbits/sec
[  4]   6.00-7.00   sec  52.4 MBytes   440 Mbits/sec
[  4]   7.00-8.00   sec  46.4 MBytes   389 Mbits/sec
[  4]   8.00-9.00   sec  49.0 MBytes   411 Mbits/sec
[  4]   9.00-10.00  sec  51.6 MBytes   433 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   510 MBytes   428 Mbits/sec                  sender
[  4]   0.00-10.00  sec   510 MBytes   428 Mbits/sec                  receiver

My hardware is an AMD Ryzen 7 2700 with 16 GB of RAM. Ethernet is Intel i350T2 Ethernet with Gigabit.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 03, 2020, 06:15:29 am
OK here are the test results as you requested:

FreeBSD 12.1 (pf enabled):

[root@fbsd1 ~]# uname -rv
12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC

[root@fbsd1 ~]# top -aSH
last pid:  2954;  load averages:  0.44,  0.42,  0.41                                                                      up 0+01:38:55  20:13:46
132 threads:   10 running, 104 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 19.7% system,  5.2% interrupt, 75.1% idle
Mem: 10M Active, 6100K Inact, 271M Wired, 21M Buf, 39G Free
Swap: 3968M Total, 3968M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31      0    96K RUN      5  94:58  95.25% [idle{idle: cpu5}]
   11 root        155 ki31      0    96K CPU1     1  93:26  83.69% [idle{idle: cpu1}]
   11 root        155 ki31      0    96K RUN      0  94:44  73.68% [idle{idle: cpu0}]
   11 root        155 ki31      0    96K CPU4     4  93:15  72.51% [idle{idle: cpu4}]
   11 root        155 ki31      0    96K CPU3     3  93:36  64.80% [idle{idle: cpu3}]
   11 root        155 ki31      0    96K RUN      2  92:55  62.29% [idle{idle: cpu2}]
    0 root        -76    -      0   480K CPU2     2   0:05  34.76% [kernel{if_io_tqg_2}]
    0 root        -76    -      0   480K CPU3     3   0:14  33.49% [kernel{if_io_tqg_3}]
   12 root        -52    -      0   304K CPU0     0  26:23  29.62% [intr{swi6: task queue}]
    0 root        -76    -      0   480K -        4   0:05  23.31% [kernel{if_io_tqg_4}]
    0 root        -76    -      0   480K -        0   0:05  12.31% [kernel{if_io_tqg_0}]
    0 root        -76    -      0   480K -        1   0:04  10.01% [kernel{if_io_tqg_1}]
   12 root        -88    -      0   304K WAIT     5   3:55   2.28% [intr{irq264: mfi0}]
    0 root        -76    -      0   480K -        5   0:06   1.88% [kernel{if_io_tqg_5}]
 2954 root         20    0    13M  3676K CPU5     5   0:00   0.02% top -aSH
   12 root        -60    -      0   304K WAIT     0   0:01   0.01% [intr{swi4: clock (0)}]
    0 root        -76    -      0   480K -        4   0:02   0.01% [kernel{if_config_tqg_0}]


Single Thread:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  8.45 GBytes  7.26 Gbits/sec  802             sender
[  4]   0.00-10.00  sec  8.45 GBytes  7.26 Gbits/sec                  receiver


10 Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  9.85 GBytes  8.46 Gbits/sec  2991             sender
[SUM]   0.00-10.00  sec  9.83 GBytes  8.45 Gbits/sec                  receiver



FreeBSD 12.1 with OPNsense Kernel (pf enabled):

[root@fbsd1 ~]# uname -rv
12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC

[root@fbsd1 ~]# fetch https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/sets/kernel-20.7.2-amd64.txz
[root@fbsd1 ~]# mv /boot/kernel /boot/kernel.old
[root@fbsd1 ~]# tar -C / -xf kernel-20.7.2-amd64.txz
[root@fbsd1 ~]# kldxref /boot/kernel
[root@fbsd1 ~]# reboot

[root@fbsd1 ~]# uname -rv
12.1-RELEASE-p8-HBSD FreeBSD 12.1-RELEASE-p8-HBSD #0  b3665671c4d(stable/20.7)-dirty: Thu Aug 27 05:58:53 CEST 2020     root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP

[root@fbsd1 ~]# top -aSH
last pid: 43891;  load averages:  0.99,  0.49,  0.20                                                                      up 0+00:04:28  20:29:24
131 threads:   13 running, 100 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 62.5% system,  3.5% interrupt, 33.9% idle
Mem: 14M Active, 1184K Inact, 270M Wired, 21M Buf, 39G Free
Swap: 3968M Total, 3968M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root        -76    -      0   480K CPU3     3   0:08  81.27% [kernel{if_io_tqg_3}]
    0 root        -76    -      0   480K CPU1     1   0:09  74.39% [kernel{if_io_tqg_1}]
    0 root        -76    -      0   480K CPU5     5   0:08  73.20% [kernel{if_io_tqg_5}]
    0 root        -76    -      0   480K CPU0     0   0:21  71.79% [kernel{if_io_tqg_0}]
   11 root        155 ki31      0    96K RUN      4   4:09  54.15% [idle{idle: cpu4}]
   11 root        155 ki31      0    96K RUN      2   4:09  51.30% [idle{idle: cpu2}]
    0 root        -76    -      0   480K CPU2     2   0:05  40.10% [kernel{if_io_tqg_2}]
    0 root        -76    -      0   480K -        4   0:09  37.60% [kernel{if_io_tqg_4}]
   11 root        155 ki31      0    96K RUN      0   4:03  26.48% [idle{idle: cpu0}]
   11 root        155 ki31      0    96K RUN      5   4:14  25.87% [idle{idle: cpu5}]
   11 root        155 ki31      0    96K RUN      1   4:09  24.32% [idle{idle: cpu1}]
   12 root        -52    -      0   304K RUN      2   1:12  20.63% [intr{swi6: task queue}]
   11 root        155 ki31      0    96K CPU3     3   4:00  17.30% [idle{idle: cpu3}]
   12 root        -88    -      0   304K WAIT     5   0:10   1.47% [intr{irq264: mfi0}]
43891 root         20    0    13M  3660K CPU4     4   0:00   0.03% top -aSH
   21 root        -16    -      0    16K -        4   0:00   0.02% [rand_harvestq]
   12 root        -60    -      0   304K WAIT     1   0:00   0.02% [intr{swi4: clock (0)}]


Single Thread:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec                  receiver


10 Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  8.16 GBytes  7.01 Gbits/sec  4260             sender
[SUM]   0.00-10.00  sec  8.13 GBytes  6.98 Gbits/sec                  receiver


I included the "top -aSH" output again because my general observation between OPNsense kernel and FreeBSD 12.1 stock kernel is the "[kernel{if_io_tqg_X}]" process usage.  Even on an actual OPNsense 20.7.2 installation I notice the exact same behavior of the "[kernel{if_io_tqg_X}]" being consistently higher and throughput significantly slower, specifically on single threaded tests.  Note that both of the top outputs were only from the 10 thread count tests only as I did not think to capture them during the single threaded test.

I can't help but think that whatever high "[kernel{if_io_tqg_X}]" on the OPNsense kernel means is starving the system of throughput potential.

Thoughts?  Next steps I can run and provide results from?

My first thought was maybe shared forwarding, but you have this with pfsense 2.5 too, correct?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 03, 2020, 12:26:17 pm
Ok, iflib, so it's related to 12.X-only, but strange it doesn't happen to vanilla 12.1

https://forums.freebsd.org/threads/what-is-kernel-if_io_tqg-100-load-of-core.70642/
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 03, 2020, 12:36:34 pm
Do you still test with this hardware?
Dell T20 (Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (4 cores))
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: hax0rwax0r on September 03, 2020, 05:20:39 pm
My first thought was maybe shared forwarding, but you have this with pfsense 2.5 too, correct?

I have never tested pfSense 2.5.  As you had previously pointed out, my test was pfSense 2.4 which was FreeBSD 11.3 based.  I mistakenly looked at the version history page and mentioned it was FreeBSD 12.1 but we determined I was incorrect in my statement.

Ok, iflib, so it's related to 12.X-only, but strange it doesn't happen to vanilla 12.1

https://forums.freebsd.org/threads/what-is-kernel-if_io_tqg-100-load-of-core.70642/

Yeah, I saw that forum post when I was Googling around, too.  I don't know what is different than vanilla FreeBSD 12.1 and the OPNsense 20.7.2 kernel that makes it higher CPU usage but it is consistent in my testing every single time.

Do you still test with this hardware?
Dell T20 (Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (4 cores))

No, every single test, with the exception of that single test I did on the Dell T20 to see if more MHz helped, has been on a Dell R430.  I have several R430 that are like-for-like and I even ran different software on each one and the results were consistent to weed out that a X520 NIC or something was bad.  The results followed the OS/kernel installed regardless of which R430 I ran it on so I am fairly confident in my hardware.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on September 03, 2020, 08:05:39 pm
My first thought was maybe shared forwarding, but you have this with pfsense 2.5 too, correct?
I tried this with the recent build of pfSense 2.5 Development (built 9/2/2020) and was able to get around 2.0gbits/sec using the same test scenario that I posted about yesterday. So it is still lower throughput than pfSense 2.4.x running on FreeBSD 11.2 in the same test scenario, however it's still higher than what we're seeing with the OPNsense 20.7 series running the 12.x kernel.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: topuli on September 04, 2020, 10:51:00 am
just for the record. i am also experiencing degraded throughput. lan routing between different vlans only with firewall enabled, no IPS etc. is around 550 Mbit/s. the setup is switch -> 1Gbit trunk -> switch -> 1Gbit trunk -> opnsense fw. Low overall traffic.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on September 04, 2020, 11:55:34 am
Overall usage core wise when loading the FW.

16 cores but only few are used. Its like multicore usage in either IDS or PF is limited.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 04, 2020, 02:09:48 pm
Overall usage core wise when loading the FW.

16 cores but only few are used. Its like multicore usage in either IDS or PF is limited.

One stream can only be handled by one core, this was in 20.1 and is in 20.7 :)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: topuli on September 04, 2020, 09:39:01 pm
A quick follow up. I am routing about 20 vlans. I read a lot about performance tuning and in one post the captive portals performance impact was mentioned. Recently i changed my WiFi setup and at some point i have tried the captive portal function for a guest vlan. So i gave it a try and disabled the captive portal (was active for one vlan) . I could not beliefe my eyes when i tested the throughput again.

captive portal enabled for one vlan:
530 Mbit/s

captive portal disabled:
910 Mbit/s

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on September 04, 2020, 09:41:35 pm
CP uses shared forwarding which sends every packet to ipfw, I'd guess 20.1 has the same problem
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: franco on September 04, 2020, 10:05:42 pm
Uh no, features decrease throughput. Where have I seen this before? Maybe in every industry firewall spec sheet...  ;)

This thread is slowly degrading and losing focus. I can't say there aren't any penalties in using the software, but if we only focus on how much better others are we run the risk of not having an objective discussion: is your OPNsense too slow? The easiest fix is to get the hardware that performs well enough. There's already money saved from the lack of licensing.

Performance will likely increase over time in the default releases if we can identify the actual differences in configuration.


Cheers,
Franco
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: topuli on September 04, 2020, 10:19:46 pm
first of all, i like opnsense and i am an absolute supporter, my comment was meant to be absolutely constructive... i personally wasn't aware that a rather simple looking feature can have a nearly 50% performance impact and i have a feeling as if i couldn't be the only one, so i just wanted to share information.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: franco on September 04, 2020, 10:49:11 pm
Shaper and captive portal require enabling the second firewall (ipfw) in tandem with the normal one (pf). Both are nice firewalls, but most features come from pf historically, while others are better suited for ipfw or are only available there.

I just think we should talk about raw throughput here with minimum configuration to make results comparable between operating systems. The more configuration and features come into play it becomes less and less possible to derive meaningful results.


Cheers,
Franco
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: incog on October 12, 2020, 10:23:27 pm
I stumbled across this thread after having the same issues as the OP with 20.7 and I'd done much of the same types of troubleshooting. Unless I missed it, I didn't see any kind of conclusion. I've read various things about issues with some nic drivers using iflib, but I haven't been able to nail anything down. For example, this post about a new netmap kernel: https://forum.opnsense.org/index.php?topic=19175.0

Though I don't know if that would even apply here since I'm not using Sensei or Suricata. I am using the vmxnet3 driver on ESXi 7 and can't get more than 1Gb/sec through a new install of opnsense. No traffic shaping or anything and all test VMs (and opnsense) are on the same vswitch. Just going between a test VM and an opnsense LAN interface are stuck at 1Gb. I can at least get 4 Gb/sec using pfsense 2.4.x. I haven't tried older versions of opnsense.

The opnsense roadmap says "Fix stability and reliability issues with regard to vmx(4), vtnet(4), ixl(4), ix(4) and em(4) ethernet drivers."  I guess I'm trying to find out of there are specific bugs or issues called out that this refers to. If the issue I'm seeing is already identified, great.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 13, 2020, 07:20:17 am
It's under investigation, 20.7.4 May bring an already fixed kernel
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: schnipp on October 15, 2020, 04:28:09 pm
Very interesting discussion here regarding degraded performance with Opnsense. Roughly, one month ago I noticed degraded performance in SMB transfers between my own server and clients. At first, I suspected my server itself as the performance bottleneck due to a kernel upgrade, short time before. I am not sure, whether this issue likewise correlates with an update of the opnsense firewall, I performed meanwhile. I investigated a little bit and got some discussions regarding issues with the server network card (intel LM-219) and Linux. But, after buying a low-priced USB network adapter (Realtek chipset) for testing, I got the same poor performance results.

My next steps are to investigate the whole network and Opnsense this upcoming weekend (if the wether is fine in this context — ⛈ 🌩 …). So, this discussion is a very interesting starting point for me and my investigation.

Here some details regarding my Opnsense (20.7.3):

 - Mainboard: Supermicro A2SDi-4C-HLN4F (Link to specs (https://www.supermicro.com/en/products/motherboard/A2SDi-4C-HLN4F))
 - RAM: 8GB
 - Network performance (past): around 900MBit/s (SMB transfer across two subnets)
 - Network performance (now):  around 200MBit/s (SMB transfer across two subnets)

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on October 16, 2020, 06:22:22 pm
I tried re-running these tests with OPNsense 20.7.3 and also tried the netmap kernel. For my particular case, this did not result in a change in throughput.

I'll recap my environment:
HP Server ML10v2/Xeon E3 1220v3/32GB of RAM

VM configurations:
Each pfSense and OPNsense VM has 2vCPU/4GB RAM/VMX3 NICs
Each pfSense and OPNsense VM has default settings and all hardware offloading disabled

The OPNsense netmap kernel was tested by doing the following:
Code: [Select]
opnsense-update -kr 20.7.3-netmap
reboot

When running these iperf3 tests, each test was run for 60 seconds, all test were run twice and the last test result is recorded here to allow some of the firewalls time to "warm up" to the throughput load. All tests were perform on the same host, and two VMs were used to simulate a WAN/LAN configuration with separate vSwitches. This allows us to push traffic through the firewall, instead of using the firewall as an iperf3 client.

Below are my results from today:

Code: [Select]
pfSense 2.5.0Build_10-16-20 1500MTU receiving from WAN, vmx3 NICs, all hardware offloading disabled, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  14.8 GBytes  2.12 Gbits/sec  550             sender
[  5]   0.00-60.00  sec  14.8 GBytes  2.12 Gbits/sec                  receiver

Code: [Select]
pfSense 2.4.5p1 1500MTU receiving from WAN, vmx3 NICs, all hardware offloading disabled, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  29.4 GBytes  4.21 Gbits/sec  12054             sender
[  5]   0.00-60.00  sec  29.4 GBytes  4.21 Gbits/sec                  receiver

Code: [Select]
OpenWRT 19.07.3 1500MTU receiving from WAN, vmx3 NICs, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  44.1 GBytes  6.31 Gbits/sec  40490             sender
[  5]   0.00-60.00  sec  44.1 GBytes  6.31 Gbits/sec                  receiver

Code: [Select]
OPNsense 20.7.3 1500MTU receiving from WAN, vmx3 NICs, all hardware offloading disabled, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  5.39 GBytes   771 Mbits/sec  362             sender
[  5]   0.00-60.00  sec  5.39 GBytes   771 Mbits/sec                  receiver

Code: [Select]
OPNsense 20.7.3(netflow disabled) 1500MTU receiving from WAN, vmx3 NICs, all hardware offloading disabled, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  6.66 GBytes   953 Mbits/sec  561             sender
[  5]   0.00-60.00  sec  6.66 GBytes   953 Mbits/sec                  receiver

Code: [Select]
OPNsense 20.7.3(netmap kernel) 1500MTU receiving from WAN, vmx3 NICs, all hardware offloading disabled, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  5.35 GBytes   766 Mbits/sec  434             sender
[  5]   0.00-60.00  sec  5.35 GBytes   766 Mbits/sec                  receiver

Code: [Select]
OPNsense 20.7.3(netmap kernel, netflow disabled) 1500MTU receiving from WAN, vmx3 NICs, all hardware offloading disabled, default ruleset
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  6.55 GBytes   937 Mbits/sec  399             sender
[  5]   0.00-60.00  sec  6.55 GBytes   937 Mbits/sec                  receiver

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on October 16, 2020, 08:55:04 pm
Its actually qiute interesting to see the performance degradation from pfsense 2.4 to 2.5

One should think that things were moving forward instead of backwards.

And could it be on purpose since TNSR is launched that somehow is able to route significant more?

I know its kernel dependant, but its really annoying that the new FreeBSD releases actually perform worse than 10.3 and the OS dependant on that.

Giving the right MTU's then yu can easily push 7+ gbit/s on a FW.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on October 16, 2020, 11:02:43 pm
I probably should have clarified on that. I tested both *sense based distros just show that they both see a hit with the FreeBSD 12.x kernel. I don't think this is out of malicious intent from either side, just teething issues due to the new way that the 12.x kernel pushes packets. I'm NOT trying to compare OPNsense to pfSense, I merely wanted to show that they both see a hit moving to 12.x.

There is an upside to all of this. I'm running OPNsense 20.7.3 on bare metal at home with the stock kernel. With the FreeBSD 12.x implementations I no longer need to leave FQ_Codel shaping enabled to get A+ scores on my 500/500 Fiber connection. It seems the way that FreeBSD 12.x handles transfer queues is much more efficient. I'm sure as time moves forward this will all get worked out. I'm posting here mainly just to show what I am seeing, and hopefully we can see the numbers get better as newer kernels are integrated.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 17, 2020, 08:09:39 am
Yes, it needs more user base to test and diagnose. I'm sure If pfsense would switch there would be faster progress. Currently it's Up to the Sensei guys and 12.1 community
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on October 17, 2020, 08:15:58 am
Do they need a sponsor to make it happen sooner?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 17, 2020, 09:14:18 am
No idea, just ask mb via PM
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AdSchellevis on October 17, 2020, 04:17:10 pm
Although we haven't experienced performance issues on the equipment we sell ourselves, quite some of the feedback in
this thread seems to be related to virtual setups.
Since we had a setup available from the webinar last Thursday, I thought to replicate the simple vmxnet3 test on our end.

Small disclaimer upfront, I'm not a frequent VMWare ESXi user, so I just followed the obvious steps.

Our test machine is really small, not extremely fast, but usable for the purpose (a random desktop which was available).

Machine specs:
Code: [Select]
Lenovo 10T700AHMH desktop
6 CPUs x Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
8GB Memory
|- OPNsense vm, 2 vcores
|- kali1, 1 vcore
|- kali2, 1 vcore

While going through the VMWare setup, for some reason I wasn't allowed to select VMXNET3, so I edited the .vmx file manually
to make sure all attached interfaces used the correct driver.

Code: [Select]
ethernetX.virtualDev = "vmxnet3"

The clients atached are simple kali linux installs, both using their own vSwitch, so traffic is measured from kali 1 to kali 2
using iperf3 (doesn't really tell a lot about real world performance, but I didn't have the time or spirit available to setup trex and proper testsets)

Code: [Select]
[kali1, client] --- vswitch1 --- [OPNsense] --- vswitch2 --- [kali2, server]
192.168.1.100/24     -     192.168.1.1/24,192.168.2.1/24   -  192.168.2.100/24

Before testing, let's establish a baseline, move both kali linux machines in the same network and iperf between them.

Code: [Select]
# iperf3 -c 192.168.2.100 -t 10000
Connecting to host 192.168.2.100, port 5201
[  5] local 192.168.2.101 port 55240 connected to 192.168.2.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  3.34 GBytes  28.7 Gbits/sec    0   1.91 MBytes       
[  5]   1.00-2.00   sec  5.03 GBytes  43.2 Gbits/sec    0   2.93 MBytes       
[  5]   2.00-3.00   sec  5.24 GBytes  45.0 Gbits/sec    0   3.08 MBytes       
[  5]   3.00-4.00   sec  5.18 GBytes  44.5 Gbits/sec    0   3.08 MBytes       
[  5]   4.00-5.00   sec  5.23 GBytes  45.0 Gbits/sec    0   3.08 MBytes       

Which is the absolute maximum my setup could reach, using linux and all defaults set.... but, since we don't use
any offloading features (https://wiki.freebsd.org/10gFreeBSD/Router), it would be fairer to check what the performance should be when disabling offloading on
the same setup.

So, we disable all offloading, assuming our router/firewall won't use them either.

Code: [Select]
# ethtool -K eth0 lro off
# ethtool -K eth0 tso off
# ethtool -K eth0 rx off
# ethtool -K eth0 tx off
# ethtool -K eth0 sg off

And test again:

Code: [Select]
# iperf3 -c 192.168.2.100 -t 10000
Connecting to host 192.168.2.100, port 5201
[  5] local 192.168.2.101 port 55274 connected to 192.168.2.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.20 GBytes  10.3 Gbits/sec    0    458 KBytes       
[  5]   1.00-2.00   sec  1.30 GBytes  11.2 Gbits/sec    0   1007 KBytes       
[  5]   2.00-3.00   sec  1.30 GBytes  11.1 Gbits/sec    0   1.18 MBytes       
[  5]   3.00-4.00   sec  1.29 GBytes  11.1 Gbits/sec    0   1.24 MBytes       
[  5]   4.00-5.00   sec  1.30 GBytes  11.2 Gbits/sec    0   1.37 MBytes       
[  5]   5.00-6.00   sec  1.31 GBytes  11.2 Gbits/sec    0   1.43 MBytes       
[  5]   6.00-7.00   sec  1.30 GBytes  11.2 Gbits/sec    0   1.51 MBytes       

Which keeps about 25% of our original throughput, vmware seems to be very efficient when hardware tasks are pushed back
to the hypervisor.

Now reconnect the kali machines back into their own networks, with OPNsense (20.7.3+new netmap kernel) in between.
The firewall policy is simple, just accept anything, no other features used.

Code: [Select]
# iperf3 -c 192.168.2.100 -t 10000
Connecting to host 192.168.2.100, port 5201
[  5] local 192.168.1.100 port 54870 connected to 192.168.2.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   280 MBytes  2.35 Gbits/sec   59    393 KBytes       
[  5]   1.00-2.00   sec   281 MBytes  2.35 Gbits/sec   33    383 KBytes       
[  5]   2.00-3.00   sec   279 MBytes  2.34 Gbits/sec   60    379 KBytes       
[  5]   3.00-4.00   sec   275 MBytes  2.31 Gbits/sec   46    380 KBytes       
[  5]   4.00-5.00   sec   276 MBytes  2.32 Gbits/sec   31    387 KBytes       

Next step is to check the man page of the vmx driver (man vmx), which shows quite some sysctl tunables which
don't seem to work anymore on 12.x, probably due to switching to iflib. One comment however seems quite relevant:

Quote
The vmx driver supports multiple transmit and receive queues.  Multiple
queues are only supported by certain VMware products, such as ESXi.  The
number of queues allocated depends on the presence of MSI-X, the number
of configured CPUs, and the tunables listed below.  FreeBSD does not
enable MSI-X support on VMware by default.  The
hw.pci.honor_msi_blacklist tunable must be disabled to enable MSI-X
support.


So we go to tunables, disable hw.pci.honor_msi_blacklist (set to 0) and reboot out machine.

Time to test again:

Code: [Select]
# iperf3 -c 192.168.2.100
Connecting to host 192.168.2.100, port 5201
[  5] local 192.168.1.100 port 54878 connected to 192.168.2.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   350 MBytes  2.93 Gbits/sec  589    304 KBytes       
[  5]   1.00-2.00   sec   342 MBytes  2.87 Gbits/sec  378    337 KBytes       
[  5]   2.00-3.00   sec   342 MBytes  2.87 Gbits/sec  324    298 KBytes       
[  5]   3.00-4.00   sec   343 MBytes  2.88 Gbits/sec  292    301 KBytes       
[  5]   4.00-5.00   sec   345 MBytes  2.89 Gbits/sec  337    307 KBytes       
[  5]   5.00-6.00   sec   341 MBytes  2.86 Gbits/sec  266    301 KBytes       
[  5]   6.00-7.00   sec   341 MBytes  2.86 Gbits/sec  301    311 KBytes       

Single flow performance is often a challenge, so to be sure, let's try to push 2 sessions through iperf3

Code: [Select]
# iperf3 -c 192.168.2.100 -P 2 -t 10000
Connecting to host 192.168.2.100, port 5201
[  5] local 192.168.1.100 port 54952 connected to 192.168.2.100 port 5201
[  7] local 192.168.1.100 port 54954 connected to 192.168.2.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   261 MBytes  2.19 Gbits/sec  176    281 KBytes       
[  7]   0.00-1.00   sec   245 MBytes  2.05 Gbits/sec  136    342 KBytes       
[SUM]   0.00-1.00   sec   506 MBytes  4.24 Gbits/sec  312             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   302 MBytes  2.54 Gbits/sec   57    281 KBytes       
[  7]   1.00-2.00   sec   208 MBytes  1.74 Gbits/sec   25    375 KBytes       
[SUM]   1.00-2.00   sec   510 MBytes  4.28 Gbits/sec   82             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   304 MBytes  2.55 Gbits/sec   45    284 KBytes       
[  7]   2.00-3.00   sec   210 MBytes  1.76 Gbits/sec    9    392 KBytes       
[SUM]   2.00-3.00   sec   514 MBytes  4.31 Gbits/sec   54             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   304 MBytes  2.55 Gbits/sec   39    386 KBytes       
[  7]   3.00-4.00   sec   209 MBytes  1.75 Gbits/sec   15    331 KBytes       
[SUM]   3.00-4.00   sec   512 MBytes  4.30 Gbits/sec   54             
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-4.95   sec   288 MBytes  2.54 Gbits/sec   39    287 KBytes       
[  7]   4.00-4.95   sec   198 MBytes  1.74 Gbits/sec   23    325 KBytes       
[SUM]   4.00-4.95   sec   485 MBytes  4.28 Gbits/sec   62             

Which is already way better, more sessions don't seem to impact my setup as far as I could see, but that could also
be caused by the number of queues confiure (2, see dmesg | grep vmx). In the new iflib world I wasn't able to
increase that number, so I'll leave it at that.

Just for fun, I disabled pf (pfctl -d) to get a bit of insights about how the firewall impacts our performance,
the details of that test are shown below (just for reference)

[code]
# iperf3 -c 192.168.2.100 -P 2 -t 10000
Connecting to host 192.168.2.100, port 5201
[  5] local 192.168.1.100 port 55038 connected to 192.168.2.100 port 5201
[  7] local 192.168.1.100 port 55040 connected to 192.168.2.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   300 MBytes  2.51 Gbits/sec    0    888 KBytes       
[  7]   0.00-1.00   sec   302 MBytes  2.53 Gbits/sec   69   2.18 MBytes       
[SUM]   0.00-1.00   sec   601 MBytes  5.04 Gbits/sec   69             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   335 MBytes  2.81 Gbits/sec  167    904 KBytes       
[  7]   1.00-2.00   sec   342 MBytes  2.87 Gbits/sec  536   1.67 MBytes       
[SUM]   1.00-2.00   sec   678 MBytes  5.68 Gbits/sec  703             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   335 MBytes  2.81 Gbits/sec    0   1.12 MBytes       
[  7]   2.00-3.00   sec   342 MBytes  2.87 Gbits/sec    0   1.81 MBytes       
[SUM]   2.00-3.00   sec   678 MBytes  5.68 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   332 MBytes  2.79 Gbits/sec  280   1.04 MBytes       
[  7]   3.00-4.00   sec   344 MBytes  2.88 Gbits/sec  482   1.44 MBytes       
[SUM]   3.00-4.00   sec   676 MBytes  5.67 Gbits/sec  762             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   332 MBytes  2.79 Gbits/sec  206   1017 KBytes       
[  7]   4.00-5.00   sec   338 MBytes  2.83 Gbits/sec  292   1.22 MBytes       
[SUM]   4.00-5.00   sec   670 MBytes  5.62 Gbits/sec  498             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   331 MBytes  2.78 Gbits/sec    0   1.21 MBytes       
[  7]   5.00-6.00   sec   339 MBytes  2.84 Gbits/sec    0   1.40 MBytes       
[SUM]   5.00-6.00   sec   670 MBytes  5.62 Gbits/sec    0             
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-6.60   sec   199 MBytes  2.78 Gbits/sec    0   1.32 MBytes       
[  7]   6.00-6.60   sec   202 MBytes  2.83 Gbits/sec    0   1.50 MBytes       
[SUM]   6.00-6.60   sec   401 MBytes  5.61 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -

On physical setups I've seen better numbers, but driver performance and settings may impact the situation (a lot).
While looking into the sysctl settings, I stumbled on this https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237166
as well.
It explains how to set the receive and send descriptors, for my test it didn't change a lot, some other iflib setting might,
I haven't tried.

Since we haven't seen huge performance degrations on our physical setups,
there's the possibility that default settings have changed in vmx (haven't looked into that, nor plan to).
Driver quality might have been better pre iflib, which is always a bit of risk on FreeBSD after major upgrades to be honest.

In our experience (on intel) the situation isn't bad at all after switching to FreeBSD 12.1,
but that's just my personal opinion (based on measurements on our equipment some months ago).

Best regards,

Ad
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 17, 2020, 09:57:19 pm
Nice write-up :)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on October 18, 2020, 04:36:04 pm
But 45 gbit/s???
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Gauss23 on October 18, 2020, 04:37:56 pm
But 45 gbit/s???

The clients atached are simple kali linux installs, both using their own vSwitch, so traffic is measured from kali 1 to kali 2

 :)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on October 18, 2020, 07:06:41 pm
They still need drivers and networking as the VSwitch is attached to a network adapter.



But 45 gbit/s???

The clients atached are simple kali linux installs, both using their own vSwitch, so traffic is measured from kali 1 to kali 2

 :)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AdSchellevis on October 18, 2020, 07:25:54 pm
your point is? just for clarification, the 45Gbps is measured between 2 linux (kali) machines on the same network (vswitch) using all default optimisations, which would be baseline (maximum achievable without anything in between) in my case.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: schnipp on October 18, 2020, 09:58:55 pm
I did some tests and noticed that my network suffers from two different problems which interfere each other. The i219-LM nic in my server has autonegotiation problems whereby performance degradation was around 80 percent. I solved this issue by forcing the nic to 1 Gbit/s full-duplex. Now, performance tests with iperf3 reach around 980 Mbit/s in direct transfer between client and server, which look fine.

After I have integrated the Opnsense again into my setup in such a way that the firewall routes the traffic between my server and client subnet, the traffic degraded from 980 Mbit/s to ca. 245 MBit/s. I should mention that my Opnsense (v.20.7.3) runs on bare metal, so a virtualization impact is impossible.

Next steps will be some ressource monitoring during iperf3 tests.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: FlightService on October 19, 2020, 10:42:26 am
How did you set your NIC to do that?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: schnipp on October 19, 2020, 02:33:08 pm
How did you set your NIC to do that?

I should clarify that my server is a Linux installation (Debian Buster) which runs on dedicated hardware. There are several discussions regarding issues of intel NICs and Linux. It doesn't matter whether the server is directly connected to a client or a switch in between. The NIC driver often reports a connection of 10MBit/s to the system, although the real performance was more than the reported speed. On the Linux machine I used "ethtool <dev> speed 1000" to disable autonegotiation.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 19, 2020, 06:58:56 pm
It's under investigation, 20.7.4 May bring an already fixed kernel

Just to add more info on this topic: vmxnet3 can't handle more than 1Gbps while traffic testing OPNSense to Windows(and reverse mode iperf) on the same vlan. It's a big hit since our users frequently access fileserver and PDM(autocad-like) data that are on different vlans(and thus, all traffic is forwarded by OPNSense). All network is 10Gbps including user workstations and esxi 6.7u3 servers.

We have noticed a big hit on transfer speeds after changing our firewall vendor to OPNSense on that location and we believe that it relates to this vmxnet3 case.

OPNSense VM Specs:

OPNSense and Windows server, same vlan, opnsense as gateway of this server vlan:
OPNSENSE to WINDOWS:
Code: [Select]
iperf3 -c 10.254.win.ip -P 8 -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.02  sec   118 MBytes  98.8 Mbits/sec    0             sender
[  5]   0.00-10.02  sec   118 MBytes  98.8 Mbits/sec                  receiver
[  7]   0.00-10.02  sec   116 MBytes  96.8 Mbits/sec    0             sender
[  7]   0.00-10.02  sec   116 MBytes  96.8 Mbits/sec                  receiver
[  9]   0.00-10.02  sec   113 MBytes  94.5 Mbits/sec    0             sender
[  9]   0.00-10.02  sec   113 MBytes  94.5 Mbits/sec                  receiver
[ 11]   0.00-10.02  sec   109 MBytes  91.5 Mbits/sec    0             sender
[ 11]   0.00-10.02  sec   109 MBytes  91.5 Mbits/sec                  receiver
[ 13]   0.00-10.02  sec   107 MBytes  89.7 Mbits/sec    0             sender
[ 13]   0.00-10.02  sec   107 MBytes  89.7 Mbits/sec                  receiver
[ 15]   0.00-10.02  sec  99.8 MBytes  83.5 Mbits/sec    0             sender
[ 15]   0.00-10.02  sec  99.8 MBytes  83.5 Mbits/sec                  receiver
[ 17]   0.00-10.02  sec  82.0 MBytes  68.7 Mbits/sec    0             sender
[ 17]   0.00-10.02  sec  82.0 MBytes  68.7 Mbits/sec                  receiver
[ 19]   0.00-10.02  sec  71.2 MBytes  59.6 Mbits/sec    0             sender
[ 19]   0.00-10.02  sec  71.2 MBytes  59.6 Mbits/sec                  receiver
[SUM]   0.00-10.02  sec   816 MBytes   683 Mbits/sec    0             sender
[SUM]   0.00-10.02  sec   816 MBytes   683 Mbits/sec                  receiver

OPNSENSE to WINDOWS(iperf3 reverse mode):
Code: [Select]
iperf3 -c 10.254.win.ip -P 8 -R -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  88.4 MBytes  74.1 Mbits/sec                  sender
[  5]   0.00-10.00  sec  88.2 MBytes  74.0 Mbits/sec                  receiver
[  7]   0.00-10.00  sec   118 MBytes  98.7 Mbits/sec                  sender
[  7]   0.00-10.00  sec   117 MBytes  98.5 Mbits/sec                  receiver
[  9]   0.00-10.00  sec  91.9 MBytes  77.1 Mbits/sec                  sender
[  9]   0.00-10.00  sec  91.7 MBytes  76.9 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec  91.6 MBytes  76.9 Mbits/sec                  sender
[ 11]   0.00-10.00  sec  91.5 MBytes  76.7 Mbits/sec                  receiver
[ 13]   0.00-10.00  sec  92.6 MBytes  77.7 Mbits/sec                  sender
[ 13]   0.00-10.00  sec  92.4 MBytes  77.5 Mbits/sec                  receiver
[ 15]   0.00-10.00  sec  94.4 MBytes  79.2 Mbits/sec                  sender
[ 15]   0.00-10.00  sec  94.2 MBytes  79.0 Mbits/sec                  receiver
[ 17]   0.00-10.00  sec   100 MBytes  84.3 Mbits/sec                  sender
[ 17]   0.00-10.00  sec   100 MBytes  84.1 Mbits/sec                  receiver
[ 19]   0.00-10.00  sec  99.9 MBytes  83.8 Mbits/sec                  sender
[ 19]   0.00-10.00  sec  99.6 MBytes  83.6 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec   777 MBytes   652 Mbits/sec                  sender
[SUM]   0.00-10.00  sec   775 MBytes   650 Mbits/sec                  receiver

Linux VM Specs:



Linux server and Windows server, same vlan cause they are designated on the "servers vlan":
LINUX TO WINDOWS:
Code: [Select]
iperf3 -c 10.254.win.ip -P 8 -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.17 GBytes  1.00 Gbits/sec  128             sender
[  4]   0.00-10.00  sec  1.17 GBytes  1.00 Gbits/sec                  receiver
[  6]   0.00-10.00  sec   275 MBytes   231 Mbits/sec   69             sender
[  6]   0.00-10.00  sec   275 MBytes   231 Mbits/sec                  receiver
[  8]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  150             sender
[  8]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec  1.13 GBytes   972 Mbits/sec   98             sender
[ 10]   0.00-10.00  sec  1.13 GBytes   972 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec   264 MBytes   222 Mbits/sec   37             sender
[ 12]   0.00-10.00  sec   264 MBytes   222 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec  1.13 GBytes   973 Mbits/sec  109             sender
[ 14]   0.00-10.00  sec  1.13 GBytes   973 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec   280 MBytes   235 Mbits/sec   34             sender
[ 16]   0.00-10.00  sec   280 MBytes   235 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec   246 MBytes   206 Mbits/sec   64             sender
[ 18]   0.00-10.00  sec   246 MBytes   206 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  5.59 GBytes  4.81 Gbits/sec  689             sender
[SUM]   0.00-10.00  sec  5.59 GBytes  4.80 Gbits/sec                  receiver

LINUX TO WINDOWS(Reverse mode iperf): This is where iperf and vmxnet reaches it's full potential
Code: [Select]
iperf3 -c 10.254.win.ip -P 8 -R -w 128k 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  3.17 GBytes  2.72 Gbits/sec                  sender
[  4]   0.00-10.00  sec  3.17 GBytes  2.72 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  3.10 GBytes  2.66 Gbits/sec                  sender
[  6]   0.00-10.00  sec  3.10 GBytes  2.66 Gbits/sec                  receiver
[  8]   0.00-10.00  sec  2.91 GBytes  2.50 Gbits/sec                  sender
[  8]   0.00-10.00  sec  2.91 GBytes  2.50 Gbits/sec                  receiver
[ 10]   0.00-10.00  sec  3.00 GBytes  2.58 Gbits/sec                  sender
[ 10]   0.00-10.00  sec  3.00 GBytes  2.58 Gbits/sec                  receiver
[ 12]   0.00-10.00  sec  2.78 GBytes  2.39 Gbits/sec                  sender
[ 12]   0.00-10.00  sec  2.78 GBytes  2.39 Gbits/sec                  receiver
[ 14]   0.00-10.00  sec  2.85 GBytes  2.45 Gbits/sec                  sender
[ 14]   0.00-10.00  sec  2.85 GBytes  2.45 Gbits/sec                  receiver
[ 16]   0.00-10.00  sec  2.68 GBytes  2.31 Gbits/sec                  sender
[ 16]   0.00-10.00  sec  2.68 GBytes  2.31 Gbits/sec                  receiver
[ 18]   0.00-10.00  sec  2.63 GBytes  2.26 Gbits/sec                  sender
[ 18]   0.00-10.00  sec  2.63 GBytes  2.26 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  23.1 GBytes  19.9 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  23.1 GBytes  19.9 Gbits/sec                  receiver
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 19, 2020, 07:38:33 pm
I have customers pushing 6Gbit over vmxnet driver.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 19, 2020, 08:20:36 pm
I have customers pushing 6Gbit over vmxnet driver.

OK. And what i'm supposed to do with this information? Not trying to be rude, but there is plenty of reports on this topic that goes against your scenario.

Do you have any idea what i could tune to achieve better performance then?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Gauss23 on October 19, 2020, 08:37:01 pm
I have customers pushing 6Gbit over vmxnet driver.

OK. And what i'm supposed to do with this information? Not trying to be rude, but there is plenty of reports on this topic that goes against your scenario.

Do you have any idea what i could tune to achieve better performance then?

What about this idea?
https://xenomorph.net/freebsd/performance-esxi/

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 19, 2020, 09:36:20 pm
What about this idea?
https://xenomorph.net/freebsd/performance-esxi/

I'll try as soon as our users stop doing transfers at that remote office :)
Nice catch.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on October 19, 2020, 09:42:48 pm
Where do you manually edit the rc.conf??
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 19, 2020, 09:47:36 pm
Where do you manually edit the rc.conf??

There is an option inside the web administration:

Interface > Settings > Hardware LRO > Uncheck it to enable LRO
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 19, 2020, 10:14:40 pm
What about this idea?
https://xenomorph.net/freebsd/performance-esxi/

Well, only enabling lro didn't change much. The guy that wrote this tutorial is using the same NIC series i'm using it was worth trying to enable lro,tso and vlan_hwfilter, and after that, things got a loooot better.

Still not catching 10Gbps, but could get almost 5Gbps which is pretty good:

Only enabling lro:
Code: [Select]
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.17  sec   118 MBytes  97.5 Mbits/sec    0             sender
[  5]   0.00-10.17  sec   118 MBytes  97.5 Mbits/sec                  receiver
[  7]   0.00-10.17  sec   120 MBytes  98.9 Mbits/sec    0             sender
[  7]   0.00-10.17  sec   120 MBytes  98.9 Mbits/sec                  receiver
[  9]   0.00-10.17  sec   120 MBytes  98.8 Mbits/sec    0             sender
[  9]   0.00-10.17  sec   120 MBytes  98.8 Mbits/sec                  receiver
[ 11]   0.00-10.17  sec   117 MBytes  96.8 Mbits/sec    0             sender
[ 11]   0.00-10.17  sec   117 MBytes  96.8 Mbits/sec                  receiver
[ 13]   0.00-10.17  sec   118 MBytes  97.4 Mbits/sec    0             sender
[ 13]   0.00-10.17  sec   118 MBytes  97.4 Mbits/sec                  receiver
[ 15]   0.00-10.17  sec   119 MBytes  98.0 Mbits/sec    0             sender
[ 15]   0.00-10.17  sec   119 MBytes  98.0 Mbits/sec                  receiver
[ 17]   0.00-10.17  sec  90.8 MBytes  74.9 Mbits/sec    0             sender
[ 17]   0.00-10.17  sec  90.8 MBytes  74.9 Mbits/sec                  receiver
[ 19]   0.00-10.17  sec  72.2 MBytes  59.6 Mbits/sec    0             sender
[ 19]   0.00-10.17  sec  72.2 MBytes  59.6 Mbits/sec                  receiver
[SUM]   0.00-10.17  sec   875 MBytes   722 Mbits/sec    0             sender
[SUM]   0.00-10.17  sec   875 MBytes   722 Mbits/sec                  receiver

iperf Done.

vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
             options=800428<VLAN_MTU,JUMBO_MTU,LRO>
             ether 00:50:56:a5:d3:68
             inet6 fe80::250:56ff:fea5:d368%vmx0 prefixlen 64 scopeid 0x1
             media: Ethernet autoselect
             status: active   
             nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL

lro, tso and vlan_hwfilter enabled:
Code: [Select]
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec  1.08 GBytes   929 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  1.08 GBytes   929 Mbits/sec                  receiver
[  7]   0.00-10.01  sec   510 MBytes   427 Mbits/sec    0             sender
[  7]   0.00-10.01  sec   510 MBytes   427 Mbits/sec                  receiver
[  9]   0.00-10.01  sec  1.05 GBytes   903 Mbits/sec    0             sender
[  9]   0.00-10.01  sec  1.05 GBytes   903 Mbits/sec                  receiver
[ 11]   0.00-10.01  sec   953 MBytes   799 Mbits/sec    0             sender
[ 11]   0.00-10.01  sec   953 MBytes   799 Mbits/sec                  receiver
[ 13]   0.00-10.01  sec   447 MBytes   375 Mbits/sec    0             sender
[ 13]   0.00-10.01  sec   447 MBytes   375 Mbits/sec                  receiver
[ 15]   0.00-10.01  sec   409 MBytes   342 Mbits/sec    0             sender
[ 15]   0.00-10.01  sec   409 MBytes   342 Mbits/sec                  receiver
[ 17]   0.00-10.01  sec   379 MBytes   318 Mbits/sec    0             sender
[ 17]   0.00-10.01  sec   379 MBytes   318 Mbits/sec                  receiver
[ 19]   0.00-10.01  sec   825 MBytes   691 Mbits/sec    0             sender
[ 19]   0.00-10.01  sec   825 MBytes   691 Mbits/sec                  receiver
[SUM]   0.00-10.01  sec  5.57 GBytes  4.78 Gbits/sec    0             sender
[SUM]   0.00-10.01  sec  5.57 GBytes  4.78 Gbits/sec                  receiver

iperf Done.
root@fw01adb:~ # ifconfig vmx0
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8507b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:50:56:a5:d3:68
        inet6 fe80::250:56ff:fea5:d368%vmx0 prefixlen 64 scopeid 0x1
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on October 19, 2020, 10:37:54 pm
But thats not 5 gbit/s....

I got better results disabling LRO on the ESXi host.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 20, 2020, 06:03:29 am
I have customers pushing 6Gbit over vmxnet driver.

OK. And what i'm supposed to do with this information? Not trying to be rude, but there is plenty of reports on this topic that goes against your scenario.

Do you have any idea what i could tune to achieve better performance then?

You wrote vmxnet cant handle more than one gb which is not true. Now when someone googles for similar problem they might think it's a general limitation. I have no idea about hyperviaors, but I dont want that wrong facts are going wild
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 20, 2020, 12:37:20 pm
You wrote vmxnet cant handle more than one gb which is not true. Now when someone googles for similar problem they might think it's a general limitation. I have no idea about hyperviaors, but I dont want that wrong facts are going wild

Just read again my reports.

vmxnet3 is not handling more than 1Gbps on FreeBSD(maybe, OPNSense specific patches). I never said vmxnet3 is garbage, and as you can see, Linux is handling traffic fine. I have other phisical machines on different offices and vmxnet3 is just fine with Linux and Windows.

And if you google for solutions, you will find plenty of information(and that also means missinformation). Bugs and other fixes(maybe iflib/vmx related) that COULD work:


UPDATE REPORT: had to disable lro, lso and vlan_hwfilter since it made traffic entering on that interface horribly slow (7Mbps max), and that is a regression that we could not handle.

Better have an interface using 1Gbps than one that uses 4,5Gbps only one way.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AdSchellevis on October 20, 2020, 12:53:16 pm
@nwilder: would you be so kind not to keep spreading inaccurate / false information around. We don't use any modifications on the vmx driver, which can do more than 1Gbps at ease on a stock FreeBSD 12.1. LRO shouldn't be used on a router for obvious reasons (also pointed at in my earlier post https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576).
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 20, 2020, 03:40:57 pm
@nwilder: would you be so kind not to keep spreading inaccurate / false information around. We don't use any modifications on the vmx driver, which can do more than 1Gbps at ease on a stock FreeBSD 12.1. LRO shouldn't be used on a router for obvious reasons (also pointed at in my earlier post https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576).

All right. I've tried lro/lso/vlan_hwfilter cause i'm running out of options here. Tried all those sysctls from that FreeBSD bugreport and no sensible performance increase was noticed after tunning tx/rx descriptors. Same 800Mbps limited on transfer whenever OPNSense tries to contact another host.

Other tests i've made:

1 - Iperf from one vlan interface to another, same parent interface(vmx0): After that, i've made another test by putting iperf to listen on one vlan interface(parent vmx0) while binding the client to another vlan interface(parent also vmx0) on this OPNSense box and got pretty good forwarding rates:

Code: [Select]
iperf3 -c 10.254.117.ip -B 10.254.110.ip -P 8 -w 128k 5201
[SUM]   0.00-10.00  sec  8.86 GBytes  7.61 Gbits/sec    0             sender
[SUM]   0.00-10.16  sec  8.86 GBytes  7.49 Gbits/sec                  receiver

I was just trying to test internal forwarding.

2 - Try do disable ipsec and it's passthrough related configs: By thinking that ipsec could be the one throttling the connection through it's passhtrough tunnels on traffic that comes in/ou of vlan interfaces, i've disabled all ipsec configs and iperf still got 800Mbps max from firewall to Windows/Linux servers.

3 - Disable PF: After disabling ipsec tunnels tried to disable pf entirely, did a fresh boot and put OPNSense in router mode. No luck (still the same iperf performance).

4 - Adding vlan 117 to a new phisical vmx interface, letting the hypervisor tag it: Presented a new interface, vlan 117 tagged by the hypervisor, changed the assignment inside OPNSense ONLY to this specific servers network. iperf tests keep getting the same speed.

Additional logs: bug id=237166 threw some light on this issue, and i've found that MSI-X vectors aren't being handled correctly by vmware(looking from the point of view that MSI-X related issues were resolved on FreeBSD). I'm looking for any documentation that could help me on this case. I'll try to thinker with hw.pci.honor_msi_blacklist=0 on loader.conf to see if i get better performance.

Code: [Select]
vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem 0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19 at device 0.0 on pci4
vmx0: Using 4096 TX descriptors and 2048 RX descriptors
vmx0: Using 4 RX queues 4 TX queues
vmx0: failed to allocate 5 MSI-X vectors, err: 6
vmx0: Using an MSI interrupt
vmx0: Ethernet address: 00:50:56:a5:d3:68
vmx0: netmap queues/slots: TX 1/4096, RX 1/4096

Edit: "hw.pci.honor_msi_blacklist: 0" removed the error form the log, but transfer rates remain the same:

Code: [Select]
vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem 0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19 at device 0.0 on pci4
vmx0: Using 4096 TX descriptors and 2048 RX descriptors
vmx0: Using 4 RX queues 4 TX queues
vmx0: Using MSI-X interrupts with 5 vectors
vmx0: Ethernet address: 00:50:56:a5:d3:68
vmx0: netmap queues/slots: TX 4/4096, RX 4/4096

root@fw01adb:~ # sysctl -a | grep blacklis
vm.page_blacklist:
hw.pci.honor_msi_blacklist: 0


Hope that some of my tests could bring light on this issue.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on October 20, 2020, 04:53:10 pm
Removing the MSI blacklist option allocated 4 netmap TX/RX queues :)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 21, 2020, 04:06:21 pm
For those interested, started a FreeBSD 13 Current VM (2020-oct-08), vmxnet3 interface, created one 802.1q vlan, and did some iperf between this guy and a Linux VM and, BOOM!. Full performance with 4 paralelism configured:

Code: [Select]
[ ID] Interval           Transfer     Bandwidth       Retr
[  5]   0.00-10.23  sec  2.34 GBytes  1.96 Gbits/sec    0             sender
[  5]   0.00-10.23  sec  2.34 GBytes  1.96 Gbits/sec                  receiver
[  7]   0.00-10.23  sec  2.09 GBytes  1.75 Gbits/sec    0             sender
[  7]   0.00-10.23  sec  2.09 GBytes  1.75 Gbits/sec                  receiver
[  9]   0.00-10.23  sec  1.67 GBytes  1.40 Gbits/sec    0             sender
[  9]   0.00-10.23  sec  1.67 GBytes  1.40 Gbits/sec                  receiver
[ 11]   0.00-10.23  sec  1.65 GBytes  1.39 Gbits/sec    0             sender
[ 11]   0.00-10.23  sec  1.65 GBytes  1.39 Gbits/sec                  receiver
[SUM]   0.00-10.23  sec  7.75 GBytes  6.50 Gbits/sec    0             sender
[SUM]   0.00-10.23  sec  7.75 GBytes  6.50 Gbits/sec                  receiver
Maybe this is some regression on 12.1.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AveryFreeman on October 22, 2020, 03:48:32 am
> How did you do that? [force 1Gbps NIC]

Turn off auto negotiation and set the nic's IF to 1gbps (?)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AveryFreeman on October 22, 2020, 04:36:49 am
You guys got me interested in this subject. I have tested plenty of iperf3 against my VMs in my little 3-host homelab, my 10GbE is just a couple DACs connected between the 10Gbe "backbone" IFs of my Dell Powerconnect 7048P, which is really more of a gigabit switch.

Usually the VMs will peg right up to ~9.4Gbps with little fluctuation if nothing else is happening, but I'm recording 3 720p video streams and 6 high-MP (4MP & 8MP) IP cameras right now, and have no interest in stopping any of it for testing right now.

I could have sworn I'd iperfed my OPNsense VM and gotten somewhere around 2.9Gbps vs the 9.4Gbps I got on my Linux, OmniOS or FreeBSD VMs (don't think I tested Windows, iperf3 is compiled weird in Win32 and doesn't yield predictable results).  So I expected it to be a bit slower, but not THIS much slower:

OPNsense 20.7.3 to OmniOS r151034
(on separate hosts)

This is a VM w/ 4 vCPU and 8GB ram, run on an E3-1230 v2 home-built Supermicro X9SPU-F host running ESXi 6.7U3.  The LAN vNIC is vmxnet3, running open-vm-tools.

Code: [Select]
root@gateway:/ # uname -a
FreeBSD gateway.webtool.space 12.1-RELEASE-p10-HBSD FreeBSD 12.1-RELEASE-p10-HBSD #0  517e44a00df(stable/20.7)-dirty: Mon Sep 21 16:21:17 CEST 2020     root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP  amd64

root@gateway:/ # iperf3 -c 192.168.1.56
Connecting to host 192.168.1.56, port 5201
[  5] local 192.168.1.1 port 13640 connected to 192.168.1.56 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   125 MBytes  1.05 Gbits/sec    0   2.00 MBytes       
[  5]   1.00-2.00   sec   126 MBytes  1.06 Gbits/sec    0   2.00 MBytes       
[  5]   2.00-3.00   sec   132 MBytes  1.11 Gbits/sec    0   2.00 MBytes       
[  5]   3.00-4.00   sec   131 MBytes  1.10 Gbits/sec    0   2.00 MBytes       
[  5]   4.00-5.00   sec   132 MBytes  1.11 Gbits/sec    0   2.00 MBytes       
[  5]   5.00-6.00   sec   135 MBytes  1.13 Gbits/sec    0   2.00 MBytes       
[  5]   6.00-7.00   sec   138 MBytes  1.16 Gbits/sec    0   2.00 MBytes       
[  5]   7.00-8.00   sec   137 MBytes  1.15 Gbits/sec    0   2.00 MBytes       
[  5]   8.00-9.00   sec   133 MBytes  1.12 Gbits/sec    0   2.00 MBytes       
[  5]   9.00-10.00  sec   131 MBytes  1.10 Gbits/sec    0   2.00 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.29 GBytes  1.11 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  1.29 GBytes  1.11 Gbits/sec                  receiver

iperf Done.

That is abysmal.  Compare that to this Bullseye VM going to same OmniOS VM (also on separate hosts)

Debian Bullseye to OmniOS r151034

Code: [Select]
avery@debbox:~$ uname -a
Linux debbox 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux

avery@debbox:~$ iperf3 -c 192.168.1.56
Connecting to host 192.168.1.56, port 5201
[  5] local 192.168.1.39 port 58064 connected to 192.168.1.56 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   688 MBytes  5.77 Gbits/sec    0   2.00 MBytes       
[  5]   1.00-2.00   sec   852 MBytes  7.15 Gbits/sec    0   2.00 MBytes       
[  5]   2.00-3.00   sec   801 MBytes  6.72 Gbits/sec  1825    730 KBytes       
[  5]   3.00-4.00   sec   779 MBytes  6.53 Gbits/sec   33   1.13 MBytes       
[  5]   4.00-5.00   sec   788 MBytes  6.61 Gbits/sec  266   1.33 MBytes       
[  5]   5.00-6.00   sec   828 MBytes  6.94 Gbits/sec  392   1.43 MBytes       
[  5]   6.00-7.00   sec   830 MBytes  6.96 Gbits/sec  477   1.49 MBytes       
[  5]   7.00-8.00   sec   826 MBytes  6.93 Gbits/sec  1286    749 KBytes       
[  5]   8.00-9.00   sec   826 MBytes  6.93 Gbits/sec    0   1.26 MBytes       
[  5]   9.00-10.00  sec   775 MBytes  6.50 Gbits/sec  278   1.38 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  7.81 GBytes  6.71 Gbits/sec  4557             sender
[  5]   0.00-10.00  sec  7.80 GBytes  6.70 Gbits/sec                  receiver

iperf Done.

So much better throughput. Even while that OmniOS VM is recording 8-9 streams of video over the network.

I'm going to install a FreeBSD kernel and see what happens.  Will be back with more benchmarks.

 
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on October 22, 2020, 05:03:05 am
It is odd that so many of us seem to find an artificial ~1gbps limit when testing OPNsense 20.7 on VMware ESXi and vmxnet3 adapters. It looks like there's at least 3 of us that are able to re-produce these results now?

I've disable the hardware blacklist and did not see a difference in my test results from what I had posted here prior. The only way I can get a little bit better throughput is to add more vCPU to the OPNsense VM, however this does not scale well. For instance, if I go from 2vCPU to 4vCPU, I can start to get between 1.5gbps and 2.2gbps depending on how much parallelism I select on my iperf clients.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AveryFreeman on October 22, 2020, 05:32:15 am
It is odd that so many of us seem to find an artificial ~1gbps limit when testing OPNsense 20.7 on VMware ESXi and vmxnet3 adapters. It looks like there's at least 3 of us that are able to re-produce these results now?

I've disable the hardware blacklist and did not see a difference in my test results from what I had posted here prior. The only way I can get a little bit better throughput is to add more vCPU to the OPNsense VM, however this does not scale well. For instance, if I go from 2vCPU to 4vCPU, I can start to get between 1.5gbps and 2.2gbps depending on how much parallelism I select on my iperf clients.

I don't think it's related to the "hardware" (even though in this case, it's virtual).  I think it's the upstream regression mentioned on page 1 - since I used to get better speeds than this before I upgraded.  I think I did my last LAN-side iperf3 tests around v18 or 19, and they were at least twice that.  In fact, I'm fairly certain I doubled my vCPUs and ram since because I was testing Sensei and never re-configured it for 2 vCPU/4GB after I uninstalled it.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 22, 2020, 07:27:38 am
It is odd that so many of us seem to find an artificial ~1gbps limit when testing OPNsense 20.7 on VMware ESXi and vmxnet3 adapters. It looks like there's at least 3 of us that are able to re-produce these results now?

I've disable the hardware blacklist and did not see a difference in my test results from what I had posted here prior. The only way I can get a little bit better throughput is to add more vCPU to the OPNsense VM, however this does not scale well. For instance, if I go from 2vCPU to 4vCPU, I can start to get between 1.5gbps and 2.2gbps depending on how much parallelism I select on my iperf clients.

Be honest to yourself, would you buy a piece of hardware with only 2 cores if you have to requirement for 10G? The smallest hardware with 10 interfaces has 4 core minimum.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on October 22, 2020, 04:06:09 pm
Be honest to yourself, would you buy a piece of hardware with only 2 cores if you have to requirement for 10G? The smallest hardware with 10 interfaces has 4 core minimum.
I think we may be talking past each other here. I'm not talking about purchasing hardware. I'm discussing a lack of throughput that now exists after an upgrade on hardware that performs at a much higher rate with just a software change. That's why we're running tests on multiple VMs, all with the same specs. There's obviously some bottleneck occurring here that isn't just explained away by core count (or lack thereof).

I have customers pushing 6Gbit over vmxnet driver.
I'm more interested in trying to understand what is different in my environment that is causing these issues on VMs? Is this claimed 6Gbit going through a virtualized OPNsense install?. Do you have any additional details that we can check? I've even tried to change CPU core assignment (change number of sockets to 1, and add cores) to see if there was some weird NUMA scaling issue impacting OPNsense. So far everything I have tried to do has had no impact on throughput, even switching to the beta netmap kernel that is supposed to resolve some of this did not seem to work yet?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: nwildner on October 22, 2020, 07:46:38 pm
You guys got me interested in this subject. I have tested plenty of iperf3 against my VMs in my little 3-host homelab, my 10GbE is just a couple DACs connected between the 10Gbe "backbone" IFs of my Dell Powerconnect 7048P, which is really more of a gigabit switch.

The infrastructure i have on that remote office i was reporting so far:

- PowerEdge R630(2 servers)
- 2 Socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz with 12 cores each(24 cores per server)
- 3x NetXtreme II BCM57800 10 Gigabit Ethernet (dual port NIC), meaning 6 phisical adapters distributed into 3 virtual switches (2 nics vm, 2 nics vmotion, 2 nics vmkernel)
- 512GB Ram each server
- Plenty of storage on an external SAS 12Gbps(2x6Gbps active + 2x6GBbps passive paths) MD3xxx Dell storage with round-robin paths
- 2x  Dell N4032F as core/backbone switches with 10Gbps ports and stacked with 2x 40Gbps ports.
- 6 port trunks for each server. 3 ports per trunk per stacking member so, each vSwitch nic will touch one stack member
- Stack member on Dell N series are treated as a unity so, LACP can be configured across stack members(no MLAG involved).

Even when trying to transfer data between vms that were not registered on the same phisical hardware i can achieve 8Gbps easily, except with vmxnet3 driver from FreeBSD 12.1.

Be honest to yourself, would you buy a piece of hardware with only 2 cores if you have to requirement for 10G? The smallest hardware with 10 interfaces has 4 core minimum.

What is not honest is to pretend that a VM cant push more than 1Gbps or achieve decent throughput rates while having only 1 vCPU configured, and that is not true. On the contrary, while doing virtualization you should always configure resources in a way that will avoid cpu oversubscription. Having for example a 4vCPUs VM that is mostly idle and does not run cpu intense operations will create problems to other vms on the same pool/share/physical hardware. For simple iperf3 and network transfer tests with FreeBSD 13 1vCPU did fine, while OPNSense(FreeBSD 12.1) with 4vCPU and high cpu shares being the only VM with that share configuration crawled during transfers.

Vmxnet3 on FreeBSD 12.1 is garbage. It seems that the port to iflib created some regressions related to MSI-X, tx/rx queues, iflib leaking MSI-x messages, non-power-of 2 tx/rx queue configs and others. I could even find some LRO regressions on commits that could explain retransmissions and the abismal lack of performance that i've reported here (https://forum.opnsense.org/index.php?topic=18754.msg90766#msg90766) on a previous page while trying to enable LRO as a workaround for that performance issue. https://svnweb.freebsd.org/base/head/sys/dev/vmware/vmxnet3/if_vmx.c?view=log

The test i've made above with FreeBSD 13-CURRENT, i was only using 1vCPU, 4GB ram, pvscsi and vmxnet3 and the system performed greatly compared with the vmxnet3 driver state of the FreeBSD 12.1-RELEASE.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Archanfel80 on October 26, 2020, 10:27:47 am
With proxmox using vnet adapter the speed is fine, but using pfsense based on freebsd 11 works fine with vmxnet3 too.
So the issue is with the HBSD and the vmxnet adapter. I dont understand why opnsense based on a half dead OS. HBSD is abandoned most of the devs. Just drop it and use the standard freebsd again.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 26, 2020, 12:02:55 pm
With proxmox using vnet adapter the speed is fine, but using pfsense based on freebsd 11 works fine with vmxnet3 too.
So the issue is with the HBSD and the vmxnet adapter. I dont understand why opnsense based on a half dead OS. HBSD is abandoned most of the devs. Just drop it and use the standard freebsd again.

FreeBSD 12.1 has the same issues ..
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Archanfel80 on October 26, 2020, 12:50:54 pm
With proxmox using vnet adapter the speed is fine, but using pfsense based on freebsd 11 works fine with vmxnet3 too.
So the issue is with the HBSD and the vmxnet adapter. I dont understand why opnsense based on a half dead OS. HBSD is abandoned most of the devs. Just drop it and use the standard freebsd again.

FreeBSD 12.1 has the same issues ..

Yes, but the pfsense current stable branch still using freebsd 11.x not 12. I think they are on point. Not a good idea switching to a newer base OS if its still have many issues. Now i have to roll back to opnsense 20.1 everywhere where i upgraded to 20.7. And the issue is not just with the vmxnet. After i upgrade to 20.7 one of my hw firewall with EFI boot, the OS no longer boot but freezed during the EFI boot. Its also a freebsd 12 related issue, i already figured out.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on October 26, 2020, 01:01:13 pm
And Sophos is using a 3.12 kernel, why upgrading to a newer one ..

If noone does the first step there wouldn't be any progress. Usually mission critical systems shouldnt be updated to a major release when not on .3 or .4. I'd even wait till a .6.

The whole discussion is way too offtopic and only updated with frustrated content.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Archanfel80 on October 26, 2020, 01:32:54 pm
It should talk about this, so maybe offtopic but still.
Half year release model, so im updated since recently, 20.7 is almost half year old now, we are close to the 21.1 now, when 20.7 will be obsolate too. You're right about that a critical system software should wait for adapting new releases. So even the 21.x series should use freebsd 11 and wait for upgrading to 12 until it will be stable. A firewall is not a good place to experiencing and making the first step.

But i can say something what is not offtopic.
Disabling net.inet.ip.redirect and net.inet.ip6.redirect, increasing net.inet.tcp.recvspace and net.inet.tcp.sendspace also kern.ipc.maxsockbuf and kern.ipc.somaxconn helps a little. Still have perfomance lost but not that bad.
I attached my tunables related config.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: franco on October 26, 2020, 02:08:45 pm
Just keep using 20.1 with all the security related caveats and missing features. I really don't see the point in complaining about user choices.


Cheers,
Franco
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Archanfel80 on October 26, 2020, 03:32:57 pm
Just keep using 20.1 with all the security related caveats and missing features. I really don't see the point in complaining about user choices.


Cheers,
Franco

I did rollback, everything is fine. The network speed is around 800mbit again (gigabit internet), with 20.7 this was just 500-600mbit. Speed is important here, i dont care about missing features i dont use any. Im not sure about the security caveats. freebsd 11 is no less secure. Until this issue not fixed i stay with 20.1.x. This servers used in production enviroment, i dont have time and oppurtunity to use these as a playground. This was exactly the same reason why i abandon using pfsense. They importing untested kernels and features and the core system become unstable and after an upgrade i have fears what will gone wrong. Opnsense did right for now, i hope the devs fix this or at least we have some workaround. The speed is not the only issue. I have to disable IPS/IDS and sensei too because its cause system freeze. I basicly neglected my firewalls. I know this is still in testing phase but 20.7 is 4 almost 5 months old now and still unable to use this features properly. And we paid for the sensei which is unusable now. This is not acceptable. So yes, i take the "risk" and did rollback wherever i can...
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AveryFreeman on October 26, 2020, 08:52:55 pm
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Archanfel80 on October 27, 2020, 08:53:09 am
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Unfortunatelly this is not so easy. You cant use a precompiled kernel from an another system. It wouldn't boot.
You have to compile from source, but newer kernel means newer headers and libraries in dependency. The compilation process could failed at some point. The only solution what could work is cherry pick the fix only and implement to the original kernel source tree and compile. But this needs work too.
I was an android kernel developer many years back so i know experiencing with the kernel is always risky.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Supermule on October 27, 2020, 10:01:12 am
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Unfortunatelly this is not so easy. You cant use a precompiled kernel from an another system. It wouldn't boot.
You have to compile from source, but newer kernel means newer headers and libraries in dependency. The compilation process could failed at some point. The only solution what could work is cherry pick the fix only and implement to the original kernel source tree and compile. But this needs work too.
I was an android kernel developer many years back so i know experiencing with the kernel is always risky.

Wouldnt it be easier to do it the other way round?

Make OS work with FBSD13? To eliminate any remnance of bad plugin code?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Archanfel80 on October 27, 2020, 12:13:06 pm
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Unfortunatelly this is not so easy. You cant use a precompiled kernel from an another system. It wouldn't boot.
You have to compile from source, but newer kernel means newer headers and libraries in dependency. The compilation process could failed at some point. The only solution what could work is cherry pick the fix only and implement to the original kernel source tree and compile. But this needs work too.
I was an android kernel developer many years back so i know experiencing with the kernel is always risky.

Wouldnt it be easier to do it the other way round?

Make OS work with FBSD13? To eliminate any remnance of bad plugin code?

They just switched to fbsd12 i dont think fbsd13 will be adapted soon. But you have the point.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Archanfel80 on October 27, 2020, 12:16:51 pm
What i find out when opnsense used in a virtualized environment its uses only one core only. The hw socket detection is faulty in case.

net.isr.maxthreads and net.isr.numthreads is always returns 1.
But it can be changed in the tunables too.
This also needs to change net.isr.dispatch from "direct" to "deferred".
This gives me massive performance boost on gigabit connection, but still not perfect. The boost comes with overhead too. But only in fbsd 12. With 20.1 what is still based on fbsd 11 its lightning fast :)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Archanfel80 on October 27, 2020, 12:23:10 pm
Using 20.7.4 with and without sysctl tuning. And using 20.1 with tuning.

With 20.7.x nothing helps the speed capped and lost around 20-30 percent because the overhead. With 20.1, you see the difference :)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: glasi on October 29, 2020, 12:57:02 pm
I'm also experiencing poor throughput with OPNsense 20.7. Maybe some of you have seen my thread in the general forum (https://forum.opnsense.org/index.php?topic=19426.0 (https://forum.opnsense.org/index.php?topic=19426.0)).

I did some testing and want to share the results with you.

Measure: In a first step, I disabled all packet filtering on the OPNsense device.
Result: No improvement.

Measure: In a second step and in order to rule out sources of error, I have removed the LAGG/LACP configuration in my setup.
Result: No improvement.

In the next step, I made some performance comparisons. I did tests with the following two setups:

a) Client (Ubuntu 20.04.1 LTS)   <-->   OPNsense (20.7.4)       <-->   File Server (Debian 10.6)
b) Client (Ubuntu 20.04.1 LTS)   <-->   Ubuntu (20.04.1 LTS)   <-->   File Server (Debian 10.6)

In both setups the client is a member of VLAN 70 and the file server is a member of VLAN 10. In setup b) I have enabled packet forwarding for IPv4.

The test results were as follows:

Samba transfer speeds (MB/sec)

Routing device        Client --> Server        Server --> Client
a) OPNsense67,371,2
b) Ubuntu108,7113,8


iPerf3 UDP transfer speeds (MBit/sec)

Routing device        Client --> Server        Server --> Client
a) OPNsense
948
23% packet loss
945
25% packet loss
b) Ubuntu
948
1% packet loss
938
0% packet loss

Packet loss leads to approx. 25% reduced throughput on the receiving device.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: glasi on October 29, 2020, 04:47:05 pm
Back with some more test results.

I did a rollback to OPNsense 20.1 for testing purposes.


Samba transfer speeds (MB/sec)

Routing device        Client --> Server        Server --> Client
OPNsense 20.1109,3102,6


iPerf3 UDP transfer speeds (MBit/sec)

Routing device        Client --> Server        Server --> Client
OPNsense 20.1
948
0% packet loss
949
0% packet loss


As you can see OPNsense 20.1 gives me full wire speed.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AveryFreeman on November 01, 2020, 07:14:40 pm
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Unfortunatelly this is not so easy. You cant use a precompiled kernel from an another system. It wouldn't boot.
You have to compile from source, but newer kernel means newer headers and libraries in dependency. The compilation process could failed at some point. The only solution what could work is cherry pick the fix only and implement to the original kernel source tree and compile. But this needs work too.
I was an android kernel developer many years back so i know experiencing with the kernel is always risky.

Wouldnt it be easier to do it the other way round?

Make OS work with FBSD13? To eliminate any remnance of bad plugin code?

It does work, and it's fairly easy. Just install OPNsense using opnsense-bootstrap over a FreeBSD installation.  You have to change the script if you want to install over a different version of FreeBSD (e.g. 13), but if you install 12.x you can just run the script.  Then boot from kernel.old or copy the kernel back to /boot/kernel, kldxref, etc.

I can't vouch for the helpfulness as my FreeBSD understanding is limited, I don't know much about kernel tuning.  Your identification of net.isr.maxthreads and net.isr.numthreads always returning 1 core seems more helpful than arbitrarily changing kernel.

How would you recommend tuning kernel for multi-threaded?  Is turning off hyperthreading a good idea?

Btw I didn't see much speed increase installing OPNsense 20.7 over 13-CURRENT and I'm suspect of its reliability, but there is a slight increase in speed installing OPNsense over 12.1-RELEASE and keeping FreeBSD kernel:  https://forum.opnsense.org/index.php?topic=19789.msg91356#msg91356

It would probably be more noticeable on 10G but I haven't done any benchmarking w/ it yet.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: devilkin on November 10, 2020, 12:25:20 pm
Looking as to what "if_io_tqg" is, and why it's eating up quite a bit of a core when doing (not even line rate) transfers on my apu2 board, I found this thread.

Has any conclusion been reached yet? Is there anything we can test/do?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on November 10, 2020, 03:06:01 pm
Not yet, no
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Klug on November 13, 2020, 06:13:43 pm
I had throughput issues with 20.7.4 on Proxmox (6.2-last).
They were related to the "offload feature" enabled (I know, I'm stupid).
Once disabled, everything is OK, maxing out the link.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AveryFreeman on December 09, 2020, 04:24:42 am
This problem seems to be getting worse - I upgraded to 20.7.5 and my iperf3 speeds have dropped from ~2Gb/s to hovering around 1Gb/s, with VM->VM speeds at ~650Mbps  :o  :-\

CentOS 8 VM on the same machine gets around 9.4Gbps

will upload some speeds when I get a chance
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: spi39492 on February 08, 2021, 06:34:10 pm
Has anyone rerun the tests with opnsense 21.1?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on February 10, 2021, 06:13:23 pm
Here are my latest results.

Recap of my environment:
Server is HP ML10v2 ESXi 6.7 running build 17167734
Xeon E3-1220 v3 CPU
32GB of RAM
SSD/HDD backed datastore (vSAN enabled)

All firewalls are tested with their "out of the box" ruleset, no customizations were made besides configure WAN/LAN adapters to work for these tests. All firewalls have their version of VM Tools installed from the package manager.

The iperf3 client/server are both Fedora Desktop v33. The server sits behind the WAN interface, the client sits behind the LAN interface to simulate traffic through the firewall. No transfer tests are performed hosting iperf3 on the firewall itself.

OPNSense 21.1.1 VM Specs:
VM hardware version 14
2 vCPU
4GB RAM
2x vmx3 NICs

pfSense 2.5.0-RC VM Specs:
VM hardware version 14
2 vCPU
4GB RAM
2x vmx3 NICs

OpenWRT VM Specs:
VM hardware version 14
2 vCPU
1GB RAM
2x vmx3 NICs

Code: [Select]
OPNsense 21.1.1 (netflow disabled) 1500MTU receiving from WAN, vmx3 NICs, all hardware offload disabled, single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  8.10 GBytes  1.16 Gbits/sec  219             sender
[  5]   0.00-60.00  sec  8.10 GBytes  1.16 Gbits/sec                  receiver

Code: [Select]
OPNsense 21.1.1 (netflow disabled) 1500MTU receiving from WAN, vmx3 NICs, all hardware offload disabled, four thread (p4)
[ ID] Interval           Transfer     Bitrate         Retr
[SUM]   0.00-60.00  sec  13.4 GBytes  1.91 Gbits/sec  2752             sender
[SUM]   0.00-60.00  sec  13.3 GBytes  1.91 Gbits/sec                  receiver

Code: [Select]
OPNsense 21.1.1 (netflow disabled) 1500MTU receiving from WAN, vmx3 NICs, all hardware offload enabled, single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec   251 MBytes  35.0 Mbits/sec  56410             sender
[  5]   0.00-60.00  sec   250 MBytes  35.0 Mbits/sec                  receiver

Code: [Select]
pfSense 2.5.0-RC 1500MTU receiving from WAN, vmx3 NICs, all hardware offload disabled, single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  15.1 GBytes  2.15 Gbits/sec  1029             sender
[  5]   0.00-60.00  sec  15.0 GBytes  2.15 Gbits/sec                  receiver

Code: [Select]
pfSense 2.5.0-RC 1500MTU receiving from WAN, vmx3 NICs, all hardware offload disabled, four thread (p4)
[ ID] Interval           Transfer     Bitrate         Retr
[SUM]   0.00-60.00  sec  15.3 GBytes  2.19 Gbits/sec  12807             sender
[SUM]   0.00-60.00  sec  15.3 GBytes  2.18 Gbits/sec                  receiver

Code: [Select]
pfSense 2.5.0-RC 1500MTU receiving from WAN, vmx3 NICs, all hardware offload enabled, single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec   316 MBytes  44.2 Mbits/sec  48082             sender
[  5]   0.00-60.00  sec   316 MBytes  44.2 Mbits/sec                  receiver

Code: [Select]
OpenWRT v19.07.6 1500MTU receiving from WAN, vmx3 NICs, no UI offload settings (using defaults), single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  34.1 GBytes  4.88 Gbits/sec  21455             sender
[  5]   0.00-60.00  sec  34.1 GBytes  4.88 Gbits/sec                  receiver

Code: [Select]
OpenWRT v19.07.6 1500MTU receiving from WAN, vmx3 NICs, no UI offload settings (using defaults), four thread (p4)
[ ID] Interval           Transfer     Bitrate         Retr
[SUM]   0.00-60.00  sec  43.2 GBytes  6.18 Gbits/sec  79765             sender
[SUM]   0.00-60.00  sec  43.2 GBytes  6.18 Gbits/sec                  receiver


host CPU usage during the transfer was as follows:
OPNsense 97% host CPU used
pfSense 84% host CPU used
OpenWRT 63% host CPU used for p1, 76% host CPU used for p4

In this case, my environment is CPU constrained. However, the purpose of these transfers is to use a best case scenario (all 1500MTU packets) and see how much we can push through the firewall with the given CPU power available. I think we're still dealing with inherent bottlenecks within FreeBSD 12. Both of the BSDs here hit high host CPU usage regardless of the thread count during the transfer. Only the Linux system scaled with more threads and still did not max the host CPU during transfers.

I personally use OPNsense and it's a great firewall. Running on bare metal hardware with IGB NICs and a modern processor made within the last 5 years or so, it will be plenty to cover gigabit speeds for most people. However, if we are virtualizing in an environment all of the BSDs seem to want a lot of CPU power to be able to scale beyond a steady 1GB/s. Perhaps FreeBSD 13 will give us more efficient virtualization throughput?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: DiHydro on February 11, 2021, 09:40:20 pm
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: opnfwb on February 11, 2021, 10:27:08 pm
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.
Using FQ_Codel or IPS are more secondary to the overall discussion here. Both of these will consume a large amount of CPU cycles and won't illustrate the true throughput capabilities of the firewall due to their own inherent overhead.

I run a J3455 with a quad port Intel I340 NIC, and can easily push 1gigabit with the stock ruleset and have plenty of CPU overhead remaining. This unit can also enable FQ_Codel on WAN and still push 1gigabit, although CPU usage does increase around 20% at 1gigabit speeds.

I don't personally run any of the IPS components so I don't have any direct feedback on that. It's worth noting that both of these tests are done on a traditional DHCP WAN connection. If you're using PPPoE, that will be single thread bound and will limit your throughput to the maximum speed of a single core.

What most of the transfer speed tests are illustrating here are that FreeBSD seems to have very poor scaling when using 10gbit virtualized NICs and forwarding packets. This isn't an OPNsense induced issue, more of an issue that OPNsense gets stuck with due to the poor upstream support from FreeBSD. For the vast majority of users on 1gigabit or lower connections, this won't be a cause for concern in the near future.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: DiHydro on February 11, 2021, 11:20:46 pm
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.
Using FQ_Codel or IPS are more secondary to the overall discussion here. Both of these will consume a large amount of CPU cycles and won't illustrate the true throughput capabilities of the firewall due to their own inherent overhead.

I run a J3455 with a quad port Intel I340 NIC, and can easily push 1gigabit with the stock ruleset and have plenty of CPU overhead remaining. This unit can also enable FQ_Codel on WAN and still push 1gigabit, although CPU usage does increase around 20% at 1gigabit speeds.

I don't personally run any of the IPS components so I don't have any direct feedback on that. It's worth noting that both of these tests are done on a traditional DHCP WAN connection. If you're using PPPoE, that will be single thread bound and will limit your throughput to the maximum speed of a single core.

What most of the transfer speed tests are illustrating here are that FreeBSD seems to have very poor scaling when using 10gbit virtualized NICs and forwarding packets. This isn't an OPNsense induced issue, more of an issue that OPNsense gets stuck with due to the poor upstream support from FreeBSD. For the vast majority of users on 1gigabit or lower connections, this won't be a cause for concern in the near future.

It sounds like I may need to reset to stock configuration and try this again. I thought that in some of my testing I had disabled all options and was running the device as a pure router and still seeing the single core limitation. Maybe I was mistaken and did still have some option that had significant CPU usage. My cable modem gives a DHCP lease to my OPNsense box, so I am not running PPPoE. When directly connected to the modem I get 390-430 Mbit/s. That is what lead me to look at the actual firewall as a throttle point.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: spi39492 on February 12, 2021, 04:19:49 pm
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.

I wonder what throughput you would receive with a Linux based fw just to see what the hardware is capable of. I made the experience with the current opnsense 21.1 release that it gives me only ~50% throughput after performance tuning in a virtualized environment. A quick test with virtualized openwrt gave me full gigabit wire speed without any optimization needed. I know that's comparing apples and oranges but it's difficult to say what a hardware platform is capable of if you don't try different things.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: DiHydro on February 12, 2021, 10:49:11 pm
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.

I wonder what throughput you would receive with a Linux based fw just to see what the hardware is capable of. I made the experience with the current opnsense 21.1 release that it gives me only ~50% throughput after performance tuning in a virtualized environment. A quick test with virtualized openwrt gave me full gigabit wire speed without any optimization needed. I know that's comparing apples and oranges but it's difficult to say what a hardware platform is capable of if you don't try different things.

I am going to try this in a day or two. IPfire is my choice right now, unless someone has a different suggestion. I will probably come back to OPNsense either way as I like this community and the project.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: spi39492 on February 13, 2021, 12:04:03 pm

I am going to try this in a day or two. IPfire is my choice right now, unless someone has a different suggestion. I will probably come back to OPNsense either way as I like this community and the project.

Yeah, I like opnsense as well. That's why it is so painful that in my setup the throughput is so limited. I did the tests with Debian and iptables on one hand and with openwrt on the other as it s available for many platforms and pretty simple to install on bare metal and in virtual environments.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: DiHydro on February 16, 2021, 01:05:39 am
So I put OPNsense on a PC that has an Intel PRO/1000 4 port NIC and an i7 2600, and with a default install I get my 450 mibt/s. Once I put a firewall rule in to enable fq_codel, then it drops to 360-380 mbit/s. I don't believe that an i7 at 3.4 GHz with an Intel NIC cannot handle these rules at full speed. What is wrong/what can I look at/how can I help make this better?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: spi39492 on February 17, 2021, 07:16:18 pm
So I put OPNsense on a PC that has an Intel PRO/1000 4 port NIC and an i7 2600, and with a default install I get my 450 mibt/s. Once I put a firewall rule in to enable fq_codel, then it drops to 360-380 mbit/s. I don't believe that an i7 at 3.4 GHz with an Intel NIC cannot handle these rules at full speed. What is wrong/what can I look at/how can I help make this better?

You can check with some of the performance setting tips laid out here https://forum.opnsense.org/index.php?topic=9264.msg93315#msg93315 (https://forum.opnsense.org/index.php?topic=9264.msg93315#msg93315)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mm-5221 on February 21, 2021, 06:58:42 pm
I have exactly the same problem. Apparently there are problems with vmxnet3 vNIC here. It's sad but I can't get higher than 1.4 Gbps. Please don't come to me with hardware. Sorry folks, it's 2021. 10gbps is what every FW should be able to do by default. Opnsense is a wonderful product. But I think you are betting on a dead horse. Why not use Linux as OS? FreeBSD slept through the virtual world (see the s... vmxnet3 support and bugs). Now I'm out of my frustration and go back to work :).
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on February 22, 2021, 06:51:16 am
I have exactly the same problem. Apparently there are problems with vmxnet3 vNIC here. It's sad but I can't get higher than 1.4 Gbps. Please don't come to me with hardware. Sorry folks, it's 2021. 10gbps is what every FW should be able to do by default. Opnsense is a wonderful product. But I think you are betting on a dead horse. Why not use Linux as OS? FreeBSD slept through the virtual world (see the s... vmxnet3 support and bugs). Now I'm out of my frustration and go back to work :).

So there's always an option to use IPFire for this use-case? :)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mm-5221 on February 22, 2021, 08:55:18 am
No, I switched from sophos UTM to opnsense some time ago. Now I do not want another migration. With the exception of WAF and that the firewall aliases are not connected to DHCP, I find that opnsense is a great product.
I have now solved my performance problem with the parameter hw.pci.honor_msi_blacklist 0. I get with -P10 (parallel jobs) with iperf3 between 8-9Gbps without IPS. With IPS unfortunately only 1.7Gbps (CPU only 30% utilized). I am still missing the performance tuning of IPS parameters in the UI. I think I could get 5-6Gbps with about 8 cores. With 12 cores should be 8-9gbps. Currently IPS/Suricata is artificially throttled somewhere in the configuration.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mavor on March 01, 2021, 09:55:45 pm
Do we have any solution here?

I have R620 (Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz - 8 cores) under ESXi7 and I have 700Mbps between OPNsense <> Ubuntu VM on the same host, while two Ubuntu VMs can do 7Gbps, 10 times faster.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: schnipp on May 03, 2021, 12:51:10 pm
Are there any news regarding this topic? Throughput is still slow on Opnsense 21.1.5  :'(
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: glasi on May 07, 2021, 09:47:37 pm
I've found a similar issue regarding slow transfers with iflib in TrueNas which has been solved.

Maybe we're facing the same issue here in OPNsense.

Please have a look at the following links/commits:

Maybe there is an expert here who can review the code snippets.

I really hope this issue can be solved soon.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: schnipp on May 11, 2021, 05:26:16 pm
Related Github ticket: #119 (https://github.com/opnsense/src/issues/119)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: balrog on August 19, 2021, 08:39:12 am
Hello Together

Unfortunately I have the same performance problem on ESXi 6.7 with vmxnet3 network adapters. The physical adapters behind are as follows:

WAN: AQtion AQN-107 (10 Gbps)
LAN: Intel 10 Gigabit Ethernet Controller 82599 (10 Gbps)
DMZ: Intel 10 Gigabit Ethernet Controller 82599 (10 Gbps)

ISP: 10/10 Gbps (XGS-PON)


The speed on OPNsense (also on pfSense) is approximately as follows:
down: 7-10 Mbps
up: 2.5-3 Gbps

On any Linux firewall (e.g. IPFire and Untangle) I get the following values:
down & up: 5-6 Gbps

I have tried all possible tunables on the OPNsense, which unfortunately didn't help.

But now I just noticed something strange:
When I have the performance monitoring active on a speedtest (Performanse Graph in WebUI or top via ssh) the speed is suddenly not even that bad:
down & up: 3-4 Gbps

If I deactivate the performance monitoring again, the values are as low as at the beginning.

Unfortunately I don't know exactly what triggers this phenomenon, but maybe someone of you has also noticed this?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 19, 2021, 10:02:39 am
Try WAN and LAN to Intel and dont use the other card
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: balrog on August 19, 2021, 12:43:27 pm
Thank you for the answer.

I previously had an Intel X550-T2 purely for the WAN connection. But after testing I found that the onboard AQtion AQN-107 with current driver from Marvell* is just as fast (so I could save one PCI-E slot).
On both Linux firewalls, I was able to max out the bandwidth of the ISP with both configurations (Intel or AQiton).

P.S. the problem was the same with the configuration with the Intel NIC

(*sorry, driver is not from broadcom, it's from Marvell)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 19, 2021, 02:45:56 pm
Interfaces LAN MSS ... set to 1300
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: balrog on August 19, 2021, 03:14:31 pm
Thanks for the hint, but I had already adjusted this value before - unfortunately without success...

What is really strange is that the speed is normal (like on the Linux Firewalls) as soon as I have "top" open in the background.
(no matter if OPNsense is tuned or on factory settings).

As if (figuratively speaking) "top" keeps the floodgates open for the network packets to flow faster.


Can anyone perhaps verify this with the same problem (vmxnet3)?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: balrog on August 19, 2021, 06:08:34 pm
I have recorded the phenomenon below:
https://ibb.co/rv8r4fn (https://ibb.co/rv8r4fn)
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Kallex on August 24, 2021, 12:54:40 pm
Just to chip in and offer possibly "standard" hardware approach.

I'm using Deciso's "own" hardware, which should help replicating/reproing the issue.

Deciso DEC 840 with OPNsense 21.4.2-amd64, FreeBSD 12.1-RELEASE-p19-HBSD

I have one main VLAN routing to untagged (main LAN). I upgraded my main switch to 10 gbps and changed my LAN+VLAN interface from GbE port to SFP+ port at 10 GbE.

Everything else works well, but VLAN <=> LAN routing causes massive lag on completely separate routing (like 400-1000ms spikes); the extreme one being CPU spike up to 80%+ which caused several seconds of 1000-1300ms spikes on separate routing (light traffic).

I will reconfigure (likely today) the VLAN parts to separate GbE interface and see if the issue solves by that, next step will be restoring whole network to GbE ports (as it was before).

I did install new switch in the network, so it might play part of this, but based on the behaviour, it seems unlikely.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: mimugmail on August 24, 2021, 01:17:01 pm
Do you use Sensei or IPS?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Kallex on August 24, 2021, 02:01:58 pm
If you meant me, no I don't have Sensei and I believe (can't even right now find the setting) I don't have IPS enabled (at least not on purpose).

We do use traffic shaping policies for 2x WANs, but that's about it. All the other is just basic (rule limited) routing between LAN/VLANs.

I didn't touch anything on the recent change, except moved the LAN (+ VLANs associated with it) from igb0 interface to ax0.

I'll configure backwards soonish (hopefully today), as the 10 Gbe wasn't yet really utilized and the issue is really easy to spot right now. So I get more info about my scenario soon.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Kallex on August 24, 2021, 04:30:50 pm
Ok, that was nice and clean to confirm.

To clarify the terms below Deciso 840 has 4x GbE ports (igb0,1,2,3) and 2x 10GbE SFP+ ports (ax0,ax1).

The issue with Deciso 840 is the 10 Gbe SFP+ ports routing VLAN traffic. In my case it was supposed to route the traffic alongside untagged LAN traffic, so this is the scenario I can confirm.


1. Before changes - VLAN routing worked

Before using SFP+ ports I had LAN + VLAN routed with igb0 interface. Everything worked well, no issues.

2. After changes - VLAN routing broken (affecting other routing too)

After moving LAN + VLAN over SFP+ port (ax0), the issues started. When VLAN-traffic was routed, heavy lag spikes on non-VLAN traffic also. I don't have performance numbers, but the traffic wasn't heavy - yet it heavily affected whole physical interface.

3. Fixed with moving VLAN to igb0 while keeping LAN on ax0

As I knew the "everything on igb0" worked, I wanted to try if its enough to move just VLAN to igb0 and keep LAN on ax0. It required some careful "tag-denial" on switch routes to not "loop" either untagged or VLANs, but the solution worked.

EDIT: Of course this workadound/fix was only feasible because my VLAN networks didn't need the 10 GbE in the first place.


As I need to change 2x managed switches and be very careful not to make my OPNsense inaccessible, I'm hesitant to try "the other way around"; moving VLANs to SFP+ and LAN to igb0 - just to test whether whole VLAN routing is broken, or is the issue just when LAN/VLAN is "routing back" through the same physical interface.

I also didn't test the 10 GbE speeds (no sensible way to test it right now through OPNsense), but the lagging/latency issue was so clear, that there obviously was something not working.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AdSchellevis on August 24, 2021, 09:07:40 pm
@Kallex Can you try to update to 21.4.3? the axgbe driver from AMD had an issue with larger packets in vlans, which lead to a lot of spam in dmesg (and reduced performance). If you do suffer from the same issue, I expect quite some kernel messages (..Big packet...) when larger packets are being processed.

The release notes for 21.4.3 are available here https://docs.opnsense.org/releases/BE_21.4.html#august-11-2021

o src: axgbe: remove unneccesary packet length check (https://github.com/opnsense/src/commit/bee1ba0981190dabcd045b6c8debfc8b8820016c)

Best regards,

Ad
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Kallex on August 24, 2021, 11:15:50 pm
I can try to; we're on production environment so I can on earliest try it on weekend.

I guess that's not the "Stable Business Branch" release, can I easily roll back to the last stable one after checking that version out?

I'll report back regardless whether I could test it or not.

EDIT: Realized it's indeed a business release. I'll test it at latest on weekend and report back.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: Kallex on August 28, 2021, 10:41:15 am
I can try to; we're on production environment so I can on earliest try it on weekend.

I guess that's not the "Stable Business Branch" release, can I easily roll back to the last stable one after checking that version out?

I'll report back regardless whether I could test it or not.

EDIT: Realized it's indeed a business release. I'll test it at latest on weekend and report back.

I got to test it now. My issue does not replicate anymore with this newest version, thank you :-).

So initially I had performance issues on routing VLAN <=> LAN through ax0 (10 GbE) on Deciso DEC 840. After this patch the issue is clearly gone.

I don't have any real performance numbers between VLANs, but the clear "laggy issue" is entirely gone now.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: alh on September 21, 2021, 11:24:41 am
I also did some testing after I noticed on a customer site that even on a 10G uplink I would max out at 600Mbps. Since then I roughly tested this on all other sites where we run OPNsense and the result is the same everywhere. OPNsense runs everywhere on either ESXi or Proxmox and on Thomas Krenn servers with the following specs:

Supermicro mainboard X10SDV-TP8F
Intel Xeon D-1518
16 GB ECC DDR4 2666 RAM

I now testet on 3 VMs, 2 running Debian Bullseye and 1 OPNsense (latest 20.1 and latest 21.7). The results are quite poor.

Debian -> Debian
> 14Gbps

Debian -> OPNsense 20.1 -> Debian
< 700Mbps

Debian -> OPNsense 21.7 -> Debian
< 900Mbps

Both OPNsense installs are using default settings, hardware offloading disabled and updated to latest version.

I tried setting the following tunables:

net.isr.maxthreads=-1
I also noticed that net.isr.maxthreads always returns 1 but when setting to -1 it reports the correct threads. However, the network throughput does not change.

hw.ibrs_disable=1
This made a significant impact and throughput increased to 2.6Gbps which is still too low but a lot better than before.

Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: AdSchellevis on September 21, 2021, 11:31:42 am
@alh, in case of ESXI most relevant details are likely already documented in https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576, the 14Gbps are probably measured with default settings, the D-1518 isn't a very fast machine so that would be reasonable using all hardware accelerated offloading settings.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: athurdent on September 27, 2021, 08:20:46 pm
For virtualised environments it helps to look into SR-IOV.

Supermicro M11SDV-8C-LN4F with Intel X710-DA2 running Proxmox 7 with SR-IOV VFs configured for OPNsense LAN and WAN on separate SFP+ slots.

Running
Code: [Select]
iperf3 -c192.168.178.8 -R -P3 -t30through the firewalls.

OPNsense 21.7.3_1 with Sensei
Code: [Select]
[SUM]   0.00-30.00  sec  10.5 GBytes  3.00 Gbits/sec  3117             sender
[SUM]   0.00-30.00  sec  10.5 GBytes  3.00 Gbits/sec                  receiver

OPNsense 21.7.3_1 without Sensei
Code: [Select]
[SUM]   0.00-30.00  sec  23.8 GBytes  6.82 Gbits/sec  514             sender
[SUM]   0.00-30.00  sec  23.8 GBytes  6.82 Gbits/sec                  receiver

Blindtest, Linux based firewall hardware:
Code: [Select]
[SUM]   0.00-30.00  sec  29.3 GBytes  8.40 Gbits/sec    0             sender
[SUM]   0.00-30.00  sec  29.3 GBytes  8.40 Gbits/sec                  receiver
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: testo_cz on September 28, 2021, 09:24:52 pm
@athurdent

Do you think SR-IOV also helps if host (virtualized env. platform) uses vSwitches ?
I work with ESXi hosts where a NIC goes directly to vSwitch and so the NIC seems not to be "sliced" for VM guests.

Thanks for the benchmarks btw.

T.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: athurdent on September 29, 2021, 05:28:19 am
@athurdent

Do you think SR-IOV also helps if host (virtualized env. platform) uses vSwitches ?
I work with ESXi hosts where a NIC goes directly to vSwitch and so the NIC seems not to be "sliced" for VM guests.

Thanks for the benchmarks btw.

T.

Hi, not sure about the ESXi implementation, they seem to have documentation on it though. https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-CC021803-30EA-444D-BCBE-618E0D836B9F.html
The card itself definitely has integrated switching capabilities. If I use a VLAN only on the card for 2 VMs to communicate (VLAN is not configured or allowed on the hardware switch the card is connected to), then I get around 18G throughput, which is done on the card internally.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: testo_cz on October 03, 2021, 11:51:48 am
@athurdent

Do you think SR-IOV also helps if host (virtualized env. platform) uses vSwitches ?
I work with ESXi hosts where a NIC goes directly to vSwitch and so the NIC seems not to be "sliced" for VM guests.

Thanks for the benchmarks btw.

T.

Hi, not sure about the ESXi implementation, they seem to have documentation on it though. https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-CC021803-30EA-444D-BCBE-618E0D836B9F.html
The card itself definitely has integrated switching capabilities. If I use a VLAN only on the card for 2 VMs to communicate (VLAN is not configured or allowed on the hardware switch the card is connected to), then I get around 18G throughput, which is done on the card internally.

Thats an interesting information -- SR-IOV cards VFs just switch between each other. It also makes sense. I can imagine how this would improve smaller setups, no matter if its ESXi or another.

ESXi docs say that Direct I/O enables HW acceleration too, no matter vSwitch, but only some scenarios. I assume its an combination of their VMXNET3 paravirt. driver magic and Physical Fuction of the NIC. What I've seen its default for large ESXi setups.

18G means the traffic went through PCIe only , cool.

Thanks. T.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: blblblb on November 02, 2021, 09:55:15 pm
I made a new thread about this very same issue but with Proxmox guests in the mix:
https://forum.opnsense.org/index.php?topic=25410.msg122060#msg122060

I don't want to blame OPNsense 100% before I rule out OVS problems, but OVS has not had issues for me in the past :(
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: iamperson347 on December 05, 2021, 07:48:25 pm
I'm chiming in to say I have seen similar issues. Running on proxmox, I can only route about 600 mbps in opnsense using virtio/vtnet. A related kernel process in opnsense shows 100% cpu usage and the underlying vhost process on the proxmox host is pegged as well.

Trying a Linux VM on the same segment (i.e. not routing the opnsense) saturates my 1gig nic on my desktop with only 25% cpu usage on the associated vhost process for the VMs nic.

I know some blame has been put on CPU speed/etc., but I think there is some sort of performance issue with the vtnet drivers. Even users of pfsense have had similar complaints. I also tried the new opnsense development build (freebsd 13) with no improvement.

I passed my nic through to the opnsense VM and reconfigured the interfaces and can route 1gbps no sweat. This is with the em driver (which supports my nic).

Note: I can get 1gbps with multiple queues set on the vtnet adapters for the opnsense VM. However, this still doesn't fix the performance issue with a single "stream."
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: linuxmail on February 02, 2022, 12:54:49 pm
Hello,

I'm joining this thread too .. we have:

* 4 x DEC-3850
* OPNsense 21.10.2-amd64 (Business edition)

Since we use OpnSense .. we have the problem with throughput .. we had in the beginning a SuperMicro X11-SSH with ~5Gb/s and switched than to the appliance. We never reach more than 2-3Gb/s (iperf3, without any special options) and it seems .. the problem is the VPN stack. So, if you have a IPSec tunnel, all traffic slows down, even it does not go through the tunnel.

we tested:

* VM -> VM same hypervisor (Proxmox) same VLAN = ~16Gb/s
* VM -> VM different  hypervisor (Proxmox) same VLAN = ~10Gb/s
* VM -> VM different  hypervisor (Proxmox) different VLAN  = 1,5Gb/s -  ~3Gb/s

So, if it goes via OpnSense .. the network slows down.

https://www.mayrhofer.eu.org/post/firewall-throughput-opnsense-openwrt/

Quote
When IPsec is active - even if the relevant traffic is not part of the IPsec policy - throughput is decreased by nearly 1/3. This seems like a real performance issue / bug in the FreeBSD/HardenedBSD kernel. I will need to try with VTI based IPsec routing to see if the in-kernel policy matching is a problem.

What makes as very sad .. if this is the real issue ..  It is not easy, to test it and disable VPN .. but we will try to build a test scenario ...

Pretty sad things ...
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: maclinuxfree on February 02, 2022, 01:20:04 pm
did you also test 22.1 ?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: franco on February 02, 2022, 01:20:31 pm
@linuxmail would you mind stopping random cross-posting, thanks
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: zwck on April 14, 2022, 07:13:46 pm
Is there a way how I could test this with a bare metal opnsense installation? How would I proceed here?
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: marcosscriven on April 27, 2022, 04:20:42 pm
EDIT - Resolved - see next post

Original post:

I'm chiming in to say I have seen similar issues. Running on proxmox, I can only route about 600 mbps in opnsense using virtio/vtnet. A related kernel process in opnsense shows 100% cpu usage and the underlying vhost process on the proxmox host is pegged as well.

I'm seeing throughput all over the place on a similar setup (I.e in a Proxmox VM)

Code: [Select]
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  97.0 MBytes   814 Mbits/sec
[  5]   1.00-2.00   sec   109 MBytes   911 Mbits/sec
[  5]   2.00-3.00   sec   111 MBytes   934 Mbits/sec
[  5]   3.00-4.00   sec   103 MBytes   867 Mbits/sec
[  5]   4.00-5.00   sec   100 MBytes   843 Mbits/sec
[  5]   5.00-6.00   sec   112 MBytes   937 Mbits/sec
[  5]   6.00-7.00   sec   109 MBytes   911 Mbits/sec
[  5]   7.00-8.00   sec  75.7 MBytes   635 Mbits/sec
[  5]   8.00-9.00   sec  68.9 MBytes   578 Mbits/sec
[  5]   9.00-10.00  sec  96.6 MBytes   810 Mbits/sec
[  5]  10.00-11.00  sec   112 MBytes   936 Mbits/sec

And while that's happening, I see the virtio_pci process maxing out:

Code: [Select]
  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   12 root        -92    -     0B   400K CPU0     0  21:42  94.37% [intr{irq29: virtio_pci1}]
51666 root          4    0    17M  6600K RUN      1   0:18  68.65% iperf3 -s
   11 root        155 ki31     0B    32K RUN      1  20.4H  13.40% [idle{idle: cpu1}]
   11 root        155 ki31     0B    32K RUN      0  20.5H   3.61% [idle{idle: cpu0}]

Are there any settings that could help with this please?

I'm on 22.1.6
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: marcosscriven on April 27, 2022, 04:50:08 pm
Further to my previous post, I actually fixed this just by turning on all the hardware acceleration options in "Interface -> Settings"

That includes CRC, TSO, and LRO. I removed the 'disabled' check and rebooted.

Now get rock solid iperf3 result:

Code: [Select]
[  5] 166.00-167.00 sec   112 MBytes   941 Mbits/sec
[  5] 167.00-168.00 sec   112 MBytes   941 Mbits/sec
[  5] 168.00-169.00 sec   112 MBytes   941 Mbits/sec
[  5] 169.00-170.00 sec   112 MBytes   941 Mbits/sec
[\code]

And NIC processing load dropped to just 25% or so:

[code]
  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31     0B    32K RUN      1   3:14  77.39% [idle{idle: cpu1}]
   11 root        155 ki31     0B    32K RUN      0   3:06  71.26% [idle{idle: cpu0}]
   12 root        -92    -     0B   400K WAIT     0   0:55  28.35% [intr{irq29: virtio_pci1}]
91430 root          4    0    17M  6008K RUN      0   0:43  21.94% iperf3 -s

What confused me was:

1) The acceleration is disabled by default (not sure why?)
2) I thought it would apply to virtio devices, but clearly they're implementing the right things to support it.

EDIT

Arghh - perhaps not. While this fixed the LAN side, suddenly the WAN side throughput plummets.

This is strange because it's using the same virtio to a separate NIC of exactly the same type.
Title: Re: Poor Throughput (Even On Same Network Segment)
Post by: supern00b on May 06, 2022, 10:46:06 am
we've also a performance issue, we've a Scop7 5510 with 10G SFP+ and just got 1,2GBit/s but that should be >9GBit.

Any ideas why this happens and how to fix that?