English Forums > Hardware and Performance

Poor Throughput (Even On Same Network Segment)

(1/32) > >>

hax0rwax0r:
I originally posted on Reddit but figured I might get more traction here with this.

I have an OPNsense 20.7.1 server running on a Dell R430 with 16 GB DDR4 RAM, an Intel Xeon E5-2620 v3 (6 cores/12 threads @ 2.40GHz) CPU and an Intel X520-SR2 10GbE NIC.

My network has several VLANs and network subnets with my OPNsense router functioning as a router on a stick doing all the traffic firewalling and routing between each network segment.

I recently upgraded my OPNsense to 20.7.1 and on a whim decided to run an iperf3 test between two VMs on different network segments to see what kind of throughput I was getting. I am certain, at least at some point, this very same hardware pushed over 6 Gbps on the same iperf3 test. Today it was getting around 850 Mbps every single time.

I started iperf3 as a server on my QNAP NAS device which is also attached to the same 10 Gbps switch and ran iperf3 as a client from OPNsense on the same network segment and got the same 850 Mbps throughput.

To make sure I wasn't limited by the QNAP NAS device, I ran the same iperf3 test with my other QNAP NAS device as a client to the first QNAP NAS device and it pushed 8.6 Gbps across the same network segment (no OPNsense involved) so both the QNAP and the switch can push it.

My question is what do I have going wrong here? Even the same network segment, OPNsense can't do more than 850 Mbps throughput. I have no idea if this was happening pre-upgrade to 20.7.1 but I know for sure it is happening now. I would assume an iperf3 test from the OPNsense server on the same network segment would surely remove any doubt it was firewalling, etc.

The interface shows 10 Gbps link speed, too, both from ifconfig and the switch itself.

My current MBUF Usage is 1 % (17726/1010734).

IDS/IPS package is installed but disabled.

I had "Hardware CRC" and "Hardware TSO" and "Hardware LRO" and "VLAN Hardware Filtering" all enabled. I have since set those all to disabled and rebooted. I can confirm that it disabled by looking at the interface flags in ifconfig:

Pre-reboot:
options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

Post-reboot:
options=803828<VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC>

I ran top and was able to see a process (kernel{if_io_tqg_2}) utilize near 100% of a CPU core during this iperf3 test:

# top -aSH

last pid: 22772;  load averages:  1.23,  0.94,  0.79                                                                                                                                                                      up 5+23:48:52  14:24:22
233 threads:   15 running, 193 sleeping, 25 waiting
CPU:  1.0% user,  0.0% nice, 16.1% system,  0.5% interrupt, 82.4% idle
Mem: 1485M Active, 297M Inact, 1657M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root        -76    -      0   848K CPU2     2 279:51  99.77% [kernel{if_io_tqg_2}]
   11 root        155 ki31      0   192K CPU3     3 130.8H  98.78% [idle{idle: cpu3}]
   11 root        155 ki31      0   192K CPU9     9 131.3H  98.75% [idle{idle: cpu9}]
   11 root        155 ki31      0   192K CPU1     1 129.7H  98.68% [idle{idle: cpu1}]
   11 root        155 ki31      0   192K CPU10   10 138.1H  98.33% [idle{idle: cpu10}]
   11 root        155 ki31      0   192K CPU5     5 130.5H  97.51% [idle{idle: cpu5}]
   11 root        155 ki31      0   192K CPU0     0 138.3H  95.78% [idle{idle: cpu0}]
   11 root        155 ki31      0   192K CPU8     8 137.7H  95.25% [idle{idle: cpu8}]
   11 root        155 ki31      0   192K CPU6     6 138.7H  95.20% [idle{idle: cpu6}]
   11 root        155 ki31      0   192K CPU4     4 138.4H  94.26% [idle{idle: cpu4}]
22772 root         82    0    15M  6772K CPU7     7   0:04  93.83% iperf3 -c 192.168.1.31
   11 root        155 ki31      0   192K RUN      7 129.4H  68.75% [idle{idle: cpu7}]
   11 root        155 ki31      0   192K RUN     11 126.8H  46.12% [idle{idle: cpu11}]
    0 root        -76    -      0   848K -        4 277:00   5.12% [kernel{if_io_tqg_4}]
   12 root        -60    -      0   400K WAIT    11 449:21   5.02% [intr{swi4: clock (0)}]
    0 root        -76    -      0   848K -        8 317:40   3.81% [kernel{if_io_tqg_8}]
    0 root        -76    -      0   848K -        0 272:13   2.71% [kernel{if_io_tqg_0}]

I occasionally see flowd_aggregate.py pop up to 100% but it doesn't seem consistent or relevant to when iperf3 is running:

# top -aSH

last pid: 99781;  load averages:  1.15,  0.90,  0.77                                                                                                                                                                      up 5+23:47:27  14:22:57
232 threads:   14 running, 193 sleeping, 25 waiting
CPU:  8.5% user,  0.0% nice,  1.6% system,  0.4% interrupt, 89.5% idle
Mem: 1481M Active, 299M Inact, 1656M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
43465 root         90    0    33M    25M CPU7     7   7:11  99.82% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)
   11 root        155 ki31      0   192K CPU9     9 131.3H  99.80% [idle{idle: cpu9}]
   11 root        155 ki31      0   192K CPU3     3 130.8H  99.68% [idle{idle: cpu3}]
   11 root        155 ki31      0   192K CPU10   10 138.1H  99.50% [idle{idle: cpu10}]
   11 root        155 ki31      0   192K CPU6     6 138.7H  98.53% [idle{idle: cpu6}]
   11 root        155 ki31      0   192K RUN      5 130.5H  98.20% [idle{idle: cpu5}]
   11 root        155 ki31      0   192K CPU1     1 129.7H  97.97% [idle{idle: cpu1}]
   11 root        155 ki31      0   192K CPU11   11 126.8H  96.52% [idle{idle: cpu11}]
   11 root        155 ki31      0   192K CPU0     0 138.3H  96.43% [idle{idle: cpu0}]
   11 root        155 ki31      0   192K CPU8     8 137.7H  95.95% [idle{idle: cpu8}]
   11 root        155 ki31      0   192K CPU2     2 138.3H  95.81% [idle{idle: cpu2}]
   11 root        155 ki31      0   192K CPU4     4 138.4H  93.94% [idle{idle: cpu4}]
   12 root        -60    -      0   400K WAIT     4 449:17   5.10% [intr{swi4: clock (0)}]
    0 root        -76    -      0   848K -        4 276:55   4.95% [kernel{if_io_tqg_4}]

What is going on here?

hax0rwax0r:
To add to this, I re-configured all my VLANs on bge0 (onboard NIC) and moved all my interfaces over to each respective bge0_vlanX interface and re-ran my iperf3 tests.

On my first test, I got the same throughput as with my Intel X520-SR2 NIC:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[  5] local 192.168.1.1 port 42455 connected to 192.168.1.31 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  92.0 MBytes   772 Mbits/sec   91   5.70 KBytes
[  5]   1.00-2.00   sec  91.1 MBytes   764 Mbits/sec   88    145 KBytes
[  5]   2.00-3.00   sec  86.1 MBytes   722 Mbits/sec   86    836 KBytes
[  5]   3.00-4.00   sec  92.5 MBytes   776 Mbits/sec   76    589 KBytes
[  5]   4.00-5.00   sec   107 MBytes   894 Mbits/sec    0    803 KBytes
[  5]   5.00-6.00   sec   107 MBytes   898 Mbits/sec    2    731 KBytes
[  5]   6.00-7.00   sec   109 MBytes   914 Mbits/sec    1    658 KBytes
[  5]   7.00-8.00   sec   110 MBytes   926 Mbits/sec    0    863 KBytes
[  5]   8.00-9.00   sec   107 MBytes   898 Mbits/sec    2    748 KBytes
[  5]   9.00-10.00  sec   109 MBytes   918 Mbits/sec    1    663 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1011 MBytes   848 Mbits/sec  347             sender
[  5]   0.00-10.32  sec  1010 MBytes   821 Mbits/sec                  receiver

For reference, I just tested with my MacBook Pro against the same iperf3 server and was able to push 926 Mbps and re-tested my QNAP to QNAP transfer and it did 9.39 Gbps to completely rule out it's an iperf3 server thing.

For the sake of testing because why not, I re-ran iperf3 from my OPNsense server once more and got near gigabit throughput:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[  5] local 192.168.1.1 port 8283 connected to 192.168.1.31 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec    0    792 KBytes
[  5]   1.00-2.00   sec   111 MBytes   932 Mbits/sec    2    698 KBytes
[  5]   2.00-3.00   sec   111 MBytes   930 Mbits/sec    1    638 KBytes
[  5]   3.00-4.00   sec   108 MBytes   905 Mbits/sec    1    585 KBytes
[  5]   4.00-5.00   sec   111 MBytes   929 Mbits/sec    0    816 KBytes
[  5]   5.00-6.00   sec   111 MBytes   929 Mbits/sec    1    776 KBytes
[  5]   6.00-7.00   sec   111 MBytes   928 Mbits/sec    1    725 KBytes
[  5]   7.00-8.00   sec   108 MBytes   906 Mbits/sec    2    663 KBytes
[  5]   8.00-9.00   sec   111 MBytes   928 Mbits/sec    2    616 KBytes
[  5]   9.00-10.00  sec   111 MBytes   928 Mbits/sec    0    837 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.07 GBytes   922 Mbits/sec   10             sender
[  5]   0.00-10.32  sec  1.07 GBytes   892 Mbits/sec                  receiver

One thing I noticed between the first and second iperf3 test was the "Retr" column of 347 vs 10.  I researched what that meant for iperf3 and found this: "It's the number of TCP segments retransmitted. This can happen if TCP segments are lost in the network due to congestion or corruption."

I also noticed during my second iperf3 test that there was now a kernel process using 99.81% CPU:

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31      0   192K CPU3     3   9:02 100.00% [idle{idle: cpu3}]
    0 root        -92    -      0   848K CPU2     2   0:30  99.81% [kernel{bge0 taskq}]

Additionally, I am not sure "Retr" in itself is a smoking gun as the QNAP to QNAP test that yielded 9.39 Gbps did 2218 retries.

The search continues.

mimugmail:
I know that bge driver has problems with OPNsense but X520 should deliver fine performance.
I tested these cards with 20.7rc1 and got full wire speed.

I can run these tests again with latest 20.7.1 but I need to finish some other stuff first.

hax0rwax0r:
I know that the Broadcom drivers aren't the best but I figured it was worth a test.  That being said, I just swapped the Intel X520-SR2 with a Chelsio T540-CR which seems to have excellent FreeBSD support and that family of NICs seems frequently recommended.

Here's the results from the Chelsio T540-CR:

# iperf3 -c 192.168.1.31
Connecting to host 192.168.1.31, port 5201
[  5] local 192.168.1.1 port 19465 connected to 192.168.1.31 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   943 Mbits/sec    0   8.00 MBytes
[  5]   1.00-2.00   sec   110 MBytes   924 Mbits/sec    0   8.00 MBytes
[  5]   2.00-3.00   sec   112 MBytes   939 Mbits/sec    0   8.00 MBytes
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec    0   8.00 MBytes
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec    0   8.00 MBytes
[  5]   5.00-6.00   sec   112 MBytes   939 Mbits/sec    0   8.00 MBytes
[  5]   6.00-7.00   sec   112 MBytes   940 Mbits/sec    0   8.00 MBytes
[  5]   7.00-8.00   sec   112 MBytes   938 Mbits/sec    0   8.00 MBytes
[  5]   8.00-9.00   sec   112 MBytes   940 Mbits/sec    0   8.00 MBytes
[  5]   9.00-10.00  sec   112 MBytes   940 Mbits/sec    0   8.00 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec    0             sender
[  5]   0.00-10.32  sec  1.09 GBytes   909 Mbits/sec                  receiver

Also thought it was interesting there were zero retransmits on the test.

I swapped out the optic on the NIC when I swapped the NIC itself.  I will swap the optic on the switch and maybe try a different switch port and fiber patch cable tomorrow, though, I doubt those are the issue.

Unfortunately, it appears that the issue was not my Intel X520-SR2 NIC as the Chelsio T540-CR exhibits the same behavior.

hax0rwax0r:
Just a status update:

Swapped optics on the switch side (both have now been switched) and swapped for a new fiber patch cable.  Same results.  I also re-enabled "Hardware CRC" and "VLAN Hardware Filtering" but left "Hardware TSO" and "Hardware LRO" disabled as I read most drivers are broken for those functions.

I also added this to /boot/loader.conf.local and rebooted:

hw.cxgbe.toecaps_allowed=0
hw.cxgbe.rdmacaps_allowed=0
hw.cxgbe.iscsicaps_allowed=0
hw.cxgbe.fcoecaps_allowed=0

Absolutely zero impact in performance.  Tomorrow I think I'll unbox my other PowerEdge R430 and put the original Intel X520-SR2 NIC in it and see if I can duplicate the problem.

I am at a total loss of what is going on here.

Navigation

[0] Message Index

[#] Next page

Go to full version