Poor Throughput (Even On Same Network Segment)

Started by hax0rwax0r, August 25, 2020, 08:31:25 PM

Previous topic - Next topic
Quote from: hax0rwax0r on September 02, 2020, 10:40:50 PM
OK here are the test results as you requested:

FreeBSD 12.1 (pf enabled):

[root@fbsd1 ~]# uname -rv
12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC

[root@fbsd1 ~]# top -aSH
last pid:  2954;  load averages:  0.44,  0.42,  0.41                                                                      up 0+01:38:55  20:13:46
132 threads:   10 running, 104 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 19.7% system,  5.2% interrupt, 75.1% idle
Mem: 10M Active, 6100K Inact, 271M Wired, 21M Buf, 39G Free
Swap: 3968M Total, 3968M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31      0    96K RUN      5  94:58  95.25% [idle{idle: cpu5}]
   11 root        155 ki31      0    96K CPU1     1  93:26  83.69% [idle{idle: cpu1}]
   11 root        155 ki31      0    96K RUN      0  94:44  73.68% [idle{idle: cpu0}]
   11 root        155 ki31      0    96K CPU4     4  93:15  72.51% [idle{idle: cpu4}]
   11 root        155 ki31      0    96K CPU3     3  93:36  64.80% [idle{idle: cpu3}]
   11 root        155 ki31      0    96K RUN      2  92:55  62.29% [idle{idle: cpu2}]
    0 root        -76    -      0   480K CPU2     2   0:05  34.76% [kernel{if_io_tqg_2}]
    0 root        -76    -      0   480K CPU3     3   0:14  33.49% [kernel{if_io_tqg_3}]
   12 root        -52    -      0   304K CPU0     0  26:23  29.62% [intr{swi6: task queue}]
    0 root        -76    -      0   480K -        4   0:05  23.31% [kernel{if_io_tqg_4}]
    0 root        -76    -      0   480K -        0   0:05  12.31% [kernel{if_io_tqg_0}]
    0 root        -76    -      0   480K -        1   0:04  10.01% [kernel{if_io_tqg_1}]
   12 root        -88    -      0   304K WAIT     5   3:55   2.28% [intr{irq264: mfi0}]
    0 root        -76    -      0   480K -        5   0:06   1.88% [kernel{if_io_tqg_5}]
2954 root         20    0    13M  3676K CPU5     5   0:00   0.02% top -aSH
   12 root        -60    -      0   304K WAIT     0   0:01   0.01% [intr{swi4: clock (0)}]
    0 root        -76    -      0   480K -        4   0:02   0.01% [kernel{if_config_tqg_0}]


Single Thread:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  8.45 GBytes  7.26 Gbits/sec  802             sender
[  4]   0.00-10.00  sec  8.45 GBytes  7.26 Gbits/sec                  receiver


10 Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  9.85 GBytes  8.46 Gbits/sec  2991             sender
[SUM]   0.00-10.00  sec  9.83 GBytes  8.45 Gbits/sec                  receiver



FreeBSD 12.1 with OPNsense Kernel (pf enabled):

[root@fbsd1 ~]# uname -rv
12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC

[root@fbsd1 ~]# fetch https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/sets/kernel-20.7.2-amd64.txz
[root@fbsd1 ~]# mv /boot/kernel /boot/kernel.old
[root@fbsd1 ~]# tar -C / -xf kernel-20.7.2-amd64.txz
[root@fbsd1 ~]# kldxref /boot/kernel
[root@fbsd1 ~]# reboot

[root@fbsd1 ~]# uname -rv
12.1-RELEASE-p8-HBSD FreeBSD 12.1-RELEASE-p8-HBSD #0  b3665671c4d(stable/20.7)-dirty: Thu Aug 27 05:58:53 CEST 2020     root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP

[root@fbsd1 ~]# top -aSH
last pid: 43891;  load averages:  0.99,  0.49,  0.20                                                                      up 0+00:04:28  20:29:24
131 threads:   13 running, 100 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 62.5% system,  3.5% interrupt, 33.9% idle
Mem: 14M Active, 1184K Inact, 270M Wired, 21M Buf, 39G Free
Swap: 3968M Total, 3968M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root        -76    -      0   480K CPU3     3   0:08  81.27% [kernel{if_io_tqg_3}]
    0 root        -76    -      0   480K CPU1     1   0:09  74.39% [kernel{if_io_tqg_1}]
    0 root        -76    -      0   480K CPU5     5   0:08  73.20% [kernel{if_io_tqg_5}]
    0 root        -76    -      0   480K CPU0     0   0:21  71.79% [kernel{if_io_tqg_0}]
   11 root        155 ki31      0    96K RUN      4   4:09  54.15% [idle{idle: cpu4}]
   11 root        155 ki31      0    96K RUN      2   4:09  51.30% [idle{idle: cpu2}]
    0 root        -76    -      0   480K CPU2     2   0:05  40.10% [kernel{if_io_tqg_2}]
    0 root        -76    -      0   480K -        4   0:09  37.60% [kernel{if_io_tqg_4}]
   11 root        155 ki31      0    96K RUN      0   4:03  26.48% [idle{idle: cpu0}]
   11 root        155 ki31      0    96K RUN      5   4:14  25.87% [idle{idle: cpu5}]
   11 root        155 ki31      0    96K RUN      1   4:09  24.32% [idle{idle: cpu1}]
   12 root        -52    -      0   304K RUN      2   1:12  20.63% [intr{swi6: task queue}]
   11 root        155 ki31      0    96K CPU3     3   4:00  17.30% [idle{idle: cpu3}]
   12 root        -88    -      0   304K WAIT     5   0:10   1.47% [intr{irq264: mfi0}]
43891 root         20    0    13M  3660K CPU4     4   0:00   0.03% top -aSH
   21 root        -16    -      0    16K -        4   0:00   0.02% [rand_harvestq]
   12 root        -60    -      0   304K WAIT     1   0:00   0.02% [intr{swi4: clock (0)}]


Single Thread:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec                  receiver


10 Threads:
[ ID] Interval           Transfer     Bandwidth       Retr
[SUM]   0.00-10.00  sec  8.16 GBytes  7.01 Gbits/sec  4260             sender
[SUM]   0.00-10.00  sec  8.13 GBytes  6.98 Gbits/sec                  receiver


I included the "top -aSH" output again because my general observation between OPNsense kernel and FreeBSD 12.1 stock kernel is the "[kernel{if_io_tqg_X}]" process usage.  Even on an actual OPNsense 20.7.2 installation I notice the exact same behavior of the "[kernel{if_io_tqg_X}]" being consistently higher and throughput significantly slower, specifically on single threaded tests.  Note that both of the top outputs were only from the 10 thread count tests only as I did not think to capture them during the single threaded test.

I can't help but think that whatever high "[kernel{if_io_tqg_X}]" on the OPNsense kernel means is starving the system of throughput potential.

Thoughts?  Next steps I can run and provide results from?

My first thought was maybe shared forwarding, but you have this with pfsense 2.5 too, correct?


Do you still test with this hardware?
Dell T20 (Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (4 cores))

Quote from: mimugmail on September 03, 2020, 06:15:29 AM
My first thought was maybe shared forwarding, but you have this with pfsense 2.5 too, correct?

I have never tested pfSense 2.5.  As you had previously pointed out, my test was pfSense 2.4 which was FreeBSD 11.3 based.  I mistakenly looked at the version history page and mentioned it was FreeBSD 12.1 but we determined I was incorrect in my statement.

Quote from: mimugmail on September 03, 2020, 12:26:17 PM
Ok, iflib, so it's related to 12.X-only, but strange it doesn't happen to vanilla 12.1

https://forums.freebsd.org/threads/what-is-kernel-if_io_tqg-100-load-of-core.70642/

Yeah, I saw that forum post when I was Googling around, too.  I don't know what is different than vanilla FreeBSD 12.1 and the OPNsense 20.7.2 kernel that makes it higher CPU usage but it is consistent in my testing every single time.

Quote from: mimugmail on September 03, 2020, 12:36:34 PM
Do you still test with this hardware?
Dell T20 (Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (4 cores))

No, every single test, with the exception of that single test I did on the Dell T20 to see if more MHz helped, has been on a Dell R430.  I have several R430 that are like-for-like and I even ran different software on each one and the results were consistent to weed out that a X520 NIC or something was bad.  The results followed the OS/kernel installed regardless of which R430 I ran it on so I am fairly confident in my hardware.

Quote from: mimugmail on September 03, 2020, 06:15:29 AM
My first thought was maybe shared forwarding, but you have this with pfsense 2.5 too, correct?
I tried this with the recent build of pfSense 2.5 Development (built 9/2/2020) and was able to get around 2.0gbits/sec using the same test scenario that I posted about yesterday. So it is still lower throughput than pfSense 2.4.x running on FreeBSD 11.2 in the same test scenario, however it's still higher than what we're seeing with the OPNsense 20.7 series running the 12.x kernel.

just for the record. i am also experiencing degraded throughput. lan routing between different vlans only with firewall enabled, no IPS etc. is around 550 Mbit/s. the setup is switch -> 1Gbit trunk -> switch -> 1Gbit trunk -> opnsense fw. Low overall traffic.

Overall usage core wise when loading the FW.

16 cores but only few are used. Its like multicore usage in either IDS or PF is limited.

Quote from: Supermule on September 04, 2020, 11:55:34 AM
Overall usage core wise when loading the FW.

16 cores but only few are used. Its like multicore usage in either IDS or PF is limited.

One stream can only be handled by one core, this was in 20.1 and is in 20.7 :)

A quick follow up. I am routing about 20 vlans. I read a lot about performance tuning and in one post the captive portals performance impact was mentioned. Recently i changed my WiFi setup and at some point i have tried the captive portal function for a guest vlan. So i gave it a try and disabled the captive portal (was active for one vlan) . I could not beliefe my eyes when i tested the throughput again.

captive portal enabled for one vlan:
530 Mbit/s

captive portal disabled:
910 Mbit/s


CP uses shared forwarding which sends every packet to ipfw, I'd guess 20.1 has the same problem

Uh no, features decrease throughput. Where have I seen this before? Maybe in every industry firewall spec sheet...  ;)

This thread is slowly degrading and losing focus. I can't say there aren't any penalties in using the software, but if we only focus on how much better others are we run the risk of not having an objective discussion: is your OPNsense too slow? The easiest fix is to get the hardware that performs well enough. There's already money saved from the lack of licensing.

Performance will likely increase over time in the default releases if we can identify the actual differences in configuration.


Cheers,
Franco

first of all, i like opnsense and i am an absolute supporter, my comment was meant to be absolutely constructive... i personally wasn't aware that a rather simple looking feature can have a nearly 50% performance impact and i have a feeling as if i couldn't be the only one, so i just wanted to share information.

Shaper and captive portal require enabling the second firewall (ipfw) in tandem with the normal one (pf). Both are nice firewalls, but most features come from pf historically, while others are better suited for ipfw or are only available there.

I just think we should talk about raw throughput here with minimum configuration to make results comparable between operating systems. The more configuration and features come into play it becomes less and less possible to derive meaningful results.


Cheers,
Franco

I stumbled across this thread after having the same issues as the OP with 20.7 and I'd done much of the same types of troubleshooting. Unless I missed it, I didn't see any kind of conclusion. I've read various things about issues with some nic drivers using iflib, but I haven't been able to nail anything down. For example, this post about a new netmap kernel: https://forum.opnsense.org/index.php?topic=19175.0

Though I don't know if that would even apply here since I'm not using Sensei or Suricata. I am using the vmxnet3 driver on ESXi 7 and can't get more than 1Gb/sec through a new install of opnsense. No traffic shaping or anything and all test VMs (and opnsense) are on the same vswitch. Just going between a test VM and an opnsense LAN interface are stuck at 1Gb. I can at least get 4 Gb/sec using pfsense 2.4.x. I haven't tried older versions of opnsense.

The opnsense roadmap says "Fix stability and reliability issues with regard to vmx(4), vtnet(4), ixl(4), ix(4) and em(4) ethernet drivers."  I guess I'm trying to find out of there are specific bugs or issues called out that this refers to. If the issue I'm seeing is already identified, great.

It's under investigation, 20.7.4 May bring an already fixed kernel