Poor Throughput (Even On Same Network Segment)

Started by hax0rwax0r, August 25, 2020, 08:31:25 PM

Previous topic - Next topic
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Quote from: AveryFreeman on October 26, 2020, 08:52:55 PM
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Unfortunatelly this is not so easy. You cant use a precompiled kernel from an another system. It wouldn't boot.
You have to compile from source, but newer kernel means newer headers and libraries in dependency. The compilation process could failed at some point. The only solution what could work is cherry pick the fix only and implement to the original kernel source tree and compile. But this needs work too.
I was an android kernel developer many years back so i know experiencing with the kernel is always risky.

Quote from: Archanfel80 on October 27, 2020, 08:53:09 AM
Quote from: AveryFreeman on October 26, 2020, 08:52:55 PM
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Unfortunatelly this is not so easy. You cant use a precompiled kernel from an another system. It wouldn't boot.
You have to compile from source, but newer kernel means newer headers and libraries in dependency. The compilation process could failed at some point. The only solution what could work is cherry pick the fix only and implement to the original kernel source tree and compile. But this needs work too.
I was an android kernel developer many years back so i know experiencing with the kernel is always risky.

Wouldnt it be easier to do it the other way round?

Make OS work with FBSD13? To eliminate any remnance of bad plugin code?

Quote from: Supermule on October 27, 2020, 10:01:12 AM
Quote from: Archanfel80 on October 27, 2020, 08:53:09 AM
Quote from: AveryFreeman on October 26, 2020, 08:52:55 PM
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Unfortunatelly this is not so easy. You cant use a precompiled kernel from an another system. It wouldn't boot.
You have to compile from source, but newer kernel means newer headers and libraries in dependency. The compilation process could failed at some point. The only solution what could work is cherry pick the fix only and implement to the original kernel source tree and compile. But this needs work too.
I was an android kernel developer many years back so i know experiencing with the kernel is always risky.

Wouldnt it be easier to do it the other way round?

Make OS work with FBSD13? To eliminate any remnance of bad plugin code?

They just switched to fbsd12 i dont think fbsd13 will be adapted soon. But you have the point.

What i find out when opnsense used in a virtualized environment its uses only one core only. The hw socket detection is faulty in case.

net.isr.maxthreads and net.isr.numthreads is always returns 1.
But it can be changed in the tunables too.
This also needs to change net.isr.dispatch from "direct" to "deferred".
This gives me massive performance boost on gigabit connection, but still not perfect. The boost comes with overhead too. But only in fbsd 12. With 20.1 what is still based on fbsd 11 its lightning fast :)

Using 20.7.4 with and without sysctl tuning. And using 20.1 with tuning.

With 20.7.x nothing helps the speed capped and lost around 20-30 percent because the overhead. With 20.1, you see the difference :)

I'm also experiencing poor throughput with OPNsense 20.7. Maybe some of you have seen my thread in the general forum (https://forum.opnsense.org/index.php?topic=19426.0).

I did some testing and want to share the results with you.

Measure: In a first step, I disabled all packet filtering on the OPNsense device.
Result: No improvement.

Measure: In a second step and in order to rule out sources of error, I have removed the LAGG/LACP configuration in my setup.
Result: No improvement.

In the next step, I made some performance comparisons. I did tests with the following two setups:

a) Client (Ubuntu 20.04.1 LTS)   <-->   OPNsense (20.7.4)       <-->   File Server (Debian 10.6)
b) Client (Ubuntu 20.04.1 LTS)   <-->   Ubuntu (20.04.1 LTS)   <-->   File Server (Debian 10.6)

In both setups the client is a member of VLAN 70 and the file server is a member of VLAN 10. In setup b) I have enabled packet forwarding for IPv4.

The test results were as follows:

Samba transfer speeds (MB/sec)





Routing device        Client --> Server        Server --> Client
a) OPNsense67,371,2
b) Ubuntu108,7113,8


iPerf3 UDP transfer speeds (MBit/sec)





Routing device        Client --> Server        Server --> Client
a) OPNsense
948
23% packet loss

945
25% packet loss
b) Ubuntu
948
1% packet loss

938
0% packet loss


Packet loss leads to approx. 25% reduced throughput on the receiving device.

Back with some more test results.

I did a rollback to OPNsense 20.1 for testing purposes.


Samba transfer speeds (MB/sec)




Routing device        Client --> Server        Server --> Client
OPNsense 20.1109,3102,6


iPerf3 UDP transfer speeds (MBit/sec)




Routing device        Client --> Server        Server --> Client
OPNsense 20.1
948
0% packet loss

949
0% packet loss


As you can see OPNsense 20.1 gives me full wire speed.

Quote from: Supermule on October 27, 2020, 10:01:12 AM
Quote from: Archanfel80 on October 27, 2020, 08:53:09 AM
Quote from: AveryFreeman on October 26, 2020, 08:52:55 PM
Would it be possible to install a stock FreeBSD 13 kernel?  Maybe they fixed the regressions.  I'm wondering if it has something to do with HBSD compile flags for security.

Unfortunatelly this is not so easy. You cant use a precompiled kernel from an another system. It wouldn't boot.
You have to compile from source, but newer kernel means newer headers and libraries in dependency. The compilation process could failed at some point. The only solution what could work is cherry pick the fix only and implement to the original kernel source tree and compile. But this needs work too.
I was an android kernel developer many years back so i know experiencing with the kernel is always risky.

Wouldnt it be easier to do it the other way round?

Make OS work with FBSD13? To eliminate any remnance of bad plugin code?

It does work, and it's fairly easy. Just install OPNsense using opnsense-bootstrap over a FreeBSD installation.  You have to change the script if you want to install over a different version of FreeBSD (e.g. 13), but if you install 12.x you can just run the script.  Then boot from kernel.old or copy the kernel back to /boot/kernel, kldxref, etc.

I can't vouch for the helpfulness as my FreeBSD understanding is limited, I don't know much about kernel tuning.  Your identification of net.isr.maxthreads and net.isr.numthreads always returning 1 core seems more helpful than arbitrarily changing kernel.

How would you recommend tuning kernel for multi-threaded?  Is turning off hyperthreading a good idea?

Btw I didn't see much speed increase installing OPNsense 20.7 over 13-CURRENT and I'm suspect of its reliability, but there is a slight increase in speed installing OPNsense over 12.1-RELEASE and keeping FreeBSD kernel:  https://forum.opnsense.org/index.php?topic=19789.msg91356#msg91356

It would probably be more noticeable on 10G but I haven't done any benchmarking w/ it yet.

Looking as to what "if_io_tqg" is, and why it's eating up quite a bit of a core when doing (not even line rate) transfers on my apu2 board, I found this thread.

Has any conclusion been reached yet? Is there anything we can test/do?


November 13, 2020, 06:13:43 PM #101 Last Edit: November 14, 2020, 07:40:36 AM by Klug
I had throughput issues with 20.7.4 on Proxmox (6.2-last).
They were related to the "offload feature" enabled (I know, I'm stupid).
Once disabled, everything is OK, maxing out the link.

This problem seems to be getting worse - I upgraded to 20.7.5 and my iperf3 speeds have dropped from ~2Gb/s to hovering around 1Gb/s, with VM->VM speeds at ~650Mbps  :o  :-\

CentOS 8 VM on the same machine gets around 9.4Gbps

will upload some speeds when I get a chance

Has anyone rerun the tests with opnsense 21.1?

Here are my latest results.

Recap of my environment:
Server is HP ML10v2 ESXi 6.7 running build 17167734
Xeon E3-1220 v3 CPU
32GB of RAM
SSD/HDD backed datastore (vSAN enabled)

All firewalls are tested with their "out of the box" ruleset, no customizations were made besides configure WAN/LAN adapters to work for these tests. All firewalls have their version of VM Tools installed from the package manager.

The iperf3 client/server are both Fedora Desktop v33. The server sits behind the WAN interface, the client sits behind the LAN interface to simulate traffic through the firewall. No transfer tests are performed hosting iperf3 on the firewall itself.

OPNSense 21.1.1 VM Specs:
VM hardware version 14
2 vCPU
4GB RAM
2x vmx3 NICs

pfSense 2.5.0-RC VM Specs:
VM hardware version 14
2 vCPU
4GB RAM
2x vmx3 NICs

OpenWRT VM Specs:
VM hardware version 14
2 vCPU
1GB RAM
2x vmx3 NICs

OPNsense 21.1.1 (netflow disabled) 1500MTU receiving from WAN, vmx3 NICs, all hardware offload disabled, single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  8.10 GBytes  1.16 Gbits/sec  219             sender
[  5]   0.00-60.00  sec  8.10 GBytes  1.16 Gbits/sec                  receiver


OPNsense 21.1.1 (netflow disabled) 1500MTU receiving from WAN, vmx3 NICs, all hardware offload disabled, four thread (p4)
[ ID] Interval           Transfer     Bitrate         Retr
[SUM]   0.00-60.00  sec  13.4 GBytes  1.91 Gbits/sec  2752             sender
[SUM]   0.00-60.00  sec  13.3 GBytes  1.91 Gbits/sec                  receiver


OPNsense 21.1.1 (netflow disabled) 1500MTU receiving from WAN, vmx3 NICs, all hardware offload enabled, single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec   251 MBytes  35.0 Mbits/sec  56410             sender
[  5]   0.00-60.00  sec   250 MBytes  35.0 Mbits/sec                  receiver


pfSense 2.5.0-RC 1500MTU receiving from WAN, vmx3 NICs, all hardware offload disabled, single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  15.1 GBytes  2.15 Gbits/sec  1029             sender
[  5]   0.00-60.00  sec  15.0 GBytes  2.15 Gbits/sec                  receiver


pfSense 2.5.0-RC 1500MTU receiving from WAN, vmx3 NICs, all hardware offload disabled, four thread (p4)
[ ID] Interval           Transfer     Bitrate         Retr
[SUM]   0.00-60.00  sec  15.3 GBytes  2.19 Gbits/sec  12807             sender
[SUM]   0.00-60.00  sec  15.3 GBytes  2.18 Gbits/sec                  receiver


pfSense 2.5.0-RC 1500MTU receiving from WAN, vmx3 NICs, all hardware offload enabled, single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec   316 MBytes  44.2 Mbits/sec  48082             sender
[  5]   0.00-60.00  sec   316 MBytes  44.2 Mbits/sec                  receiver


OpenWRT v19.07.6 1500MTU receiving from WAN, vmx3 NICs, no UI offload settings (using defaults), single thread (p1)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  34.1 GBytes  4.88 Gbits/sec  21455             sender
[  5]   0.00-60.00  sec  34.1 GBytes  4.88 Gbits/sec                  receiver


OpenWRT v19.07.6 1500MTU receiving from WAN, vmx3 NICs, no UI offload settings (using defaults), four thread (p4)
[ ID] Interval           Transfer     Bitrate         Retr
[SUM]   0.00-60.00  sec  43.2 GBytes  6.18 Gbits/sec  79765             sender
[SUM]   0.00-60.00  sec  43.2 GBytes  6.18 Gbits/sec                  receiver



host CPU usage during the transfer was as follows:
OPNsense 97% host CPU used
pfSense 84% host CPU used
OpenWRT 63% host CPU used for p1, 76% host CPU used for p4

In this case, my environment is CPU constrained. However, the purpose of these transfers is to use a best case scenario (all 1500MTU packets) and see how much we can push through the firewall with the given CPU power available. I think we're still dealing with inherent bottlenecks within FreeBSD 12. Both of the BSDs here hit high host CPU usage regardless of the thread count during the transfer. Only the Linux system scaled with more threads and still did not max the host CPU during transfers.

I personally use OPNsense and it's a great firewall. Running on bare metal hardware with IGB NICs and a modern processor made within the last 5 years or so, it will be plenty to cover gigabit speeds for most people. However, if we are virtualizing in an environment all of the BSDs seem to want a lot of CPU power to be able to scale beyond a steady 1GB/s. Perhaps FreeBSD 13 will give us more efficient virtualization throughput?