1
Hardware and Performance / Poor Throughput (Even On Same Network Segment)
« on: August 25, 2020, 08:31:25 pm »
I originally posted on Reddit but figured I might get more traction here with this.
I have an OPNsense 20.7.1 server running on a Dell R430 with 16 GB DDR4 RAM, an Intel Xeon E5-2620 v3 (6 cores/12 threads @ 2.40GHz) CPU and an Intel X520-SR2 10GbE NIC.
My network has several VLANs and network subnets with my OPNsense router functioning as a router on a stick doing all the traffic firewalling and routing between each network segment.
I recently upgraded my OPNsense to 20.7.1 and on a whim decided to run an iperf3 test between two VMs on different network segments to see what kind of throughput I was getting. I am certain, at least at some point, this very same hardware pushed over 6 Gbps on the same iperf3 test. Today it was getting around 850 Mbps every single time.
I started iperf3 as a server on my QNAP NAS device which is also attached to the same 10 Gbps switch and ran iperf3 as a client from OPNsense on the same network segment and got the same 850 Mbps throughput.
To make sure I wasn't limited by the QNAP NAS device, I ran the same iperf3 test with my other QNAP NAS device as a client to the first QNAP NAS device and it pushed 8.6 Gbps across the same network segment (no OPNsense involved) so both the QNAP and the switch can push it.
My question is what do I have going wrong here? Even the same network segment, OPNsense can't do more than 850 Mbps throughput. I have no idea if this was happening pre-upgrade to 20.7.1 but I know for sure it is happening now. I would assume an iperf3 test from the OPNsense server on the same network segment would surely remove any doubt it was firewalling, etc.
The interface shows 10 Gbps link speed, too, both from ifconfig and the switch itself.
My current MBUF Usage is 1 % (17726/1010734).
IDS/IPS package is installed but disabled.
I had "Hardware CRC" and "Hardware TSO" and "Hardware LRO" and "VLAN Hardware Filtering" all enabled. I have since set those all to disabled and rebooted. I can confirm that it disabled by looking at the interface flags in ifconfig:
Pre-reboot:
options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
Post-reboot:
options=803828<VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
I ran top and was able to see a process (kernel{if_io_tqg_2}) utilize near 100% of a CPU core during this iperf3 test:
# top -aSH
last pid: 22772; load averages: 1.23, 0.94, 0.79 up 5+23:48:52 14:24:22
233 threads: 15 running, 193 sleeping, 25 waiting
CPU: 1.0% user, 0.0% nice, 16.1% system, 0.5% interrupt, 82.4% idle
Mem: 1485M Active, 297M Inact, 1657M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -76 - 0 848K CPU2 2 279:51 99.77% [kernel{if_io_tqg_2}]
11 root 155 ki31 0 192K CPU3 3 130.8H 98.78% [idle{idle: cpu3}]
11 root 155 ki31 0 192K CPU9 9 131.3H 98.75% [idle{idle: cpu9}]
11 root 155 ki31 0 192K CPU1 1 129.7H 98.68% [idle{idle: cpu1}]
11 root 155 ki31 0 192K CPU10 10 138.1H 98.33% [idle{idle: cpu10}]
11 root 155 ki31 0 192K CPU5 5 130.5H 97.51% [idle{idle: cpu5}]
11 root 155 ki31 0 192K CPU0 0 138.3H 95.78% [idle{idle: cpu0}]
11 root 155 ki31 0 192K CPU8 8 137.7H 95.25% [idle{idle: cpu8}]
11 root 155 ki31 0 192K CPU6 6 138.7H 95.20% [idle{idle: cpu6}]
11 root 155 ki31 0 192K CPU4 4 138.4H 94.26% [idle{idle: cpu4}]
22772 root 82 0 15M 6772K CPU7 7 0:04 93.83% iperf3 -c 192.168.1.31
11 root 155 ki31 0 192K RUN 7 129.4H 68.75% [idle{idle: cpu7}]
11 root 155 ki31 0 192K RUN 11 126.8H 46.12% [idle{idle: cpu11}]
0 root -76 - 0 848K - 4 277:00 5.12% [kernel{if_io_tqg_4}]
12 root -60 - 0 400K WAIT 11 449:21 5.02% [intr{swi4: clock (0)}]
0 root -76 - 0 848K - 8 317:40 3.81% [kernel{if_io_tqg_8}]
0 root -76 - 0 848K - 0 272:13 2.71% [kernel{if_io_tqg_0}]
I occasionally see flowd_aggregate.py pop up to 100% but it doesn't seem consistent or relevant to when iperf3 is running:
# top -aSH
last pid: 99781; load averages: 1.15, 0.90, 0.77 up 5+23:47:27 14:22:57
232 threads: 14 running, 193 sleeping, 25 waiting
CPU: 8.5% user, 0.0% nice, 1.6% system, 0.4% interrupt, 89.5% idle
Mem: 1481M Active, 299M Inact, 1656M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
43465 root 90 0 33M 25M CPU7 7 7:11 99.82% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)
11 root 155 ki31 0 192K CPU9 9 131.3H 99.80% [idle{idle: cpu9}]
11 root 155 ki31 0 192K CPU3 3 130.8H 99.68% [idle{idle: cpu3}]
11 root 155 ki31 0 192K CPU10 10 138.1H 99.50% [idle{idle: cpu10}]
11 root 155 ki31 0 192K CPU6 6 138.7H 98.53% [idle{idle: cpu6}]
11 root 155 ki31 0 192K RUN 5 130.5H 98.20% [idle{idle: cpu5}]
11 root 155 ki31 0 192K CPU1 1 129.7H 97.97% [idle{idle: cpu1}]
11 root 155 ki31 0 192K CPU11 11 126.8H 96.52% [idle{idle: cpu11}]
11 root 155 ki31 0 192K CPU0 0 138.3H 96.43% [idle{idle: cpu0}]
11 root 155 ki31 0 192K CPU8 8 137.7H 95.95% [idle{idle: cpu8}]
11 root 155 ki31 0 192K CPU2 2 138.3H 95.81% [idle{idle: cpu2}]
11 root 155 ki31 0 192K CPU4 4 138.4H 93.94% [idle{idle: cpu4}]
12 root -60 - 0 400K WAIT 4 449:17 5.10% [intr{swi4: clock (0)}]
0 root -76 - 0 848K - 4 276:55 4.95% [kernel{if_io_tqg_4}]
What is going on here?
I have an OPNsense 20.7.1 server running on a Dell R430 with 16 GB DDR4 RAM, an Intel Xeon E5-2620 v3 (6 cores/12 threads @ 2.40GHz) CPU and an Intel X520-SR2 10GbE NIC.
My network has several VLANs and network subnets with my OPNsense router functioning as a router on a stick doing all the traffic firewalling and routing between each network segment.
I recently upgraded my OPNsense to 20.7.1 and on a whim decided to run an iperf3 test between two VMs on different network segments to see what kind of throughput I was getting. I am certain, at least at some point, this very same hardware pushed over 6 Gbps on the same iperf3 test. Today it was getting around 850 Mbps every single time.
I started iperf3 as a server on my QNAP NAS device which is also attached to the same 10 Gbps switch and ran iperf3 as a client from OPNsense on the same network segment and got the same 850 Mbps throughput.
To make sure I wasn't limited by the QNAP NAS device, I ran the same iperf3 test with my other QNAP NAS device as a client to the first QNAP NAS device and it pushed 8.6 Gbps across the same network segment (no OPNsense involved) so both the QNAP and the switch can push it.
My question is what do I have going wrong here? Even the same network segment, OPNsense can't do more than 850 Mbps throughput. I have no idea if this was happening pre-upgrade to 20.7.1 but I know for sure it is happening now. I would assume an iperf3 test from the OPNsense server on the same network segment would surely remove any doubt it was firewalling, etc.
The interface shows 10 Gbps link speed, too, both from ifconfig and the switch itself.
My current MBUF Usage is 1 % (17726/1010734).
IDS/IPS package is installed but disabled.
I had "Hardware CRC" and "Hardware TSO" and "Hardware LRO" and "VLAN Hardware Filtering" all enabled. I have since set those all to disabled and rebooted. I can confirm that it disabled by looking at the interface flags in ifconfig:
Pre-reboot:
options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
Post-reboot:
options=803828<VLAN_MTU,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
I ran top and was able to see a process (kernel{if_io_tqg_2}) utilize near 100% of a CPU core during this iperf3 test:
# top -aSH
last pid: 22772; load averages: 1.23, 0.94, 0.79 up 5+23:48:52 14:24:22
233 threads: 15 running, 193 sleeping, 25 waiting
CPU: 1.0% user, 0.0% nice, 16.1% system, 0.5% interrupt, 82.4% idle
Mem: 1485M Active, 297M Inact, 1657M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -76 - 0 848K CPU2 2 279:51 99.77% [kernel{if_io_tqg_2}]
11 root 155 ki31 0 192K CPU3 3 130.8H 98.78% [idle{idle: cpu3}]
11 root 155 ki31 0 192K CPU9 9 131.3H 98.75% [idle{idle: cpu9}]
11 root 155 ki31 0 192K CPU1 1 129.7H 98.68% [idle{idle: cpu1}]
11 root 155 ki31 0 192K CPU10 10 138.1H 98.33% [idle{idle: cpu10}]
11 root 155 ki31 0 192K CPU5 5 130.5H 97.51% [idle{idle: cpu5}]
11 root 155 ki31 0 192K CPU0 0 138.3H 95.78% [idle{idle: cpu0}]
11 root 155 ki31 0 192K CPU8 8 137.7H 95.25% [idle{idle: cpu8}]
11 root 155 ki31 0 192K CPU6 6 138.7H 95.20% [idle{idle: cpu6}]
11 root 155 ki31 0 192K CPU4 4 138.4H 94.26% [idle{idle: cpu4}]
22772 root 82 0 15M 6772K CPU7 7 0:04 93.83% iperf3 -c 192.168.1.31
11 root 155 ki31 0 192K RUN 7 129.4H 68.75% [idle{idle: cpu7}]
11 root 155 ki31 0 192K RUN 11 126.8H 46.12% [idle{idle: cpu11}]
0 root -76 - 0 848K - 4 277:00 5.12% [kernel{if_io_tqg_4}]
12 root -60 - 0 400K WAIT 11 449:21 5.02% [intr{swi4: clock (0)}]
0 root -76 - 0 848K - 8 317:40 3.81% [kernel{if_io_tqg_8}]
0 root -76 - 0 848K - 0 272:13 2.71% [kernel{if_io_tqg_0}]
I occasionally see flowd_aggregate.py pop up to 100% but it doesn't seem consistent or relevant to when iperf3 is running:
# top -aSH
last pid: 99781; load averages: 1.15, 0.90, 0.77 up 5+23:47:27 14:22:57
232 threads: 14 running, 193 sleeping, 25 waiting
CPU: 8.5% user, 0.0% nice, 1.6% system, 0.4% interrupt, 89.5% idle
Mem: 1481M Active, 299M Inact, 1656M Wired, 935M Buf, 12G Free
Swap: 8192M Total, 8192M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
43465 root 90 0 33M 25M CPU7 7 7:11 99.82% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)
11 root 155 ki31 0 192K CPU9 9 131.3H 99.80% [idle{idle: cpu9}]
11 root 155 ki31 0 192K CPU3 3 130.8H 99.68% [idle{idle: cpu3}]
11 root 155 ki31 0 192K CPU10 10 138.1H 99.50% [idle{idle: cpu10}]
11 root 155 ki31 0 192K CPU6 6 138.7H 98.53% [idle{idle: cpu6}]
11 root 155 ki31 0 192K RUN 5 130.5H 98.20% [idle{idle: cpu5}]
11 root 155 ki31 0 192K CPU1 1 129.7H 97.97% [idle{idle: cpu1}]
11 root 155 ki31 0 192K CPU11 11 126.8H 96.52% [idle{idle: cpu11}]
11 root 155 ki31 0 192K CPU0 0 138.3H 96.43% [idle{idle: cpu0}]
11 root 155 ki31 0 192K CPU8 8 137.7H 95.95% [idle{idle: cpu8}]
11 root 155 ki31 0 192K CPU2 2 138.3H 95.81% [idle{idle: cpu2}]
11 root 155 ki31 0 192K CPU4 4 138.4H 93.94% [idle{idle: cpu4}]
12 root -60 - 0 400K WAIT 4 449:17 5.10% [intr{swi4: clock (0)}]
0 root -76 - 0 848K - 4 276:55 4.95% [kernel{if_io_tqg_4}]
What is going on here?