Poor Throughput (Even On Same Network Segment)

Started by hax0rwax0r, August 25, 2020, 08:31:25 PM

Previous topic - Next topic
August 15, 2022, 10:22:46 PM #150 Last Edit: August 16, 2022, 09:30:35 PM by masterderp
Update: I don't know if others have made the same mistake, but do a traceroute from your iperf client to iperf server and make sure it looks right.  Do a netstat -rn on your opnsense box too and make sure the routing table looks sane.  In my testing I was putting the wan side into my normal network, and the lan side into an isolated proxmox bridge with no physical port attached.  For some reason, OPNsense is routing the traffic all the way to WAN's upstream gateway, which is a physical 1gb router outside of my proxmox environment.  I'm not sure why yet, but here's what's happening:

vtnet0 WAN 10.0.0.1->WAN gateway (physical router) 10.0.0.254
vtnet1 LAN 10.0.1.1
iperf client on lan side: 10.0.1.1
iperf server on wan network: 10.0.0.100

traceroute from iperf client to iperf server (through opnsense):
1 10.0.1.1
2 10.0.0.254
3 10.0.0.100

traceroute from iperf client to iperf server (through pfsense):
1 10.0.1.1
2 10.0.0.100

Deleted the route entry from system->routes->status and it works as expected now, but how did that entry get there in the first place? I have a second opnsense test instance that did the same thing.

Original post:

Anyone have any updates on this?  Is this now considered a known bug?  I saw early in the thread a link to a Github PR that was merged, but it looks like it's been included in 22.7.   I setup two identical VMs in Proxmox, one Pfsense 2.6.0, one OPNsense 22.7.  VMs have 12 E5-2620 cores (vm cpu set to "host"), 4GB of ram, and two virtio nics.  Nothing was changed other than setting a static lan IP for each instance.  Traffic is tested as such with all VMs on the same proxmox host (including the iperf client and server):

iperf client->iperf server: 10gb/s
iperf client->pfsense lan->pfsense wan->iperf server: 2.5gb/s
iperf client->OPNsense lan->OPNsense wan->iperf server: 0.743gb/s

I then set hw.ibrs_disable=1 (note if CPU is set to the default KVM64, this isn't needed and performance is the same)
iperf client->OPNsense lan->OPNsense wan->iperf server: 0.933gb/s

Also tested with multiple iperf streams (-P 20) and got the same speeds.

CPU usage was high when testing, but then I enabled multiqueue on the Proxmox nics (six on each nic) and CPU usage dropped to basically nothing, and then I topped out right at 940mb/s, exactly the max TCP speed on a gigabit link.  I find it pretty suspicious and it makes me think something in the chain is being limited to gigabit ethernet.  It does show the nics as "10gbaseT <full duplex>" in the UI, and again my iperf client VM and iperf server VM both have 10g interfaces and when connected directly to each other, pull a full 10gb/s.



I have multi-gigabit Internet and recently decided to transition to an OPNsense server running inside of a Proxmox VM with Virtio network adapters as my main router at home, not realizing at the time that so many performance issues existed....

I read through this entire thread and combed through numerous other resources online.  It seems like a lot of people are hung up on this issue and definitive answers are in short supply.

I went through the journey and experienced everything mentioned in this thread pretty much, even marcosscriven's uncanny post about how hardware acceleration caused the LAN side of the network performance to improve and WAN throughput to plummit.

I'm posting here now because I solved this issue for my setup.  My OPNsense running in a Proxmox KVM virtual machine is now able to keep up with my 6 gig Internet.



I made a lot of changes that I'm not sure if they all helped or not (I'm quite sure a large number of them had no immediately noticeable effect), but I decided to leave a lot of changes in place because the various things I'd read about some of these changes throughout the process made sense to me and increasing the values seemed logical in many cases even if there was no noticeable performance improvement.

You can read my entire writeup on my blog where I go through the whole journey in detail if you want:  https://binaryimpulse.com/2022/11/opnsense-performance-tuning-for-multi-gigabit-internet/

In a nutshell my solution came down to leaving all of the hardware offloading disabled and configuring a bunch of sysctl values compiled from like 5 different sources which eventually led to my desired performance.  I made some minor changes to the Proxmox VM too like enabling multiqueue on the network adapter, but I'm skeptical whether any of those changes really mattered.

The sysctl values that worked for me (and I think sysctl tuning overall did the most to solve the problem - along with disabling hardware offloading) were the following:

hw.ibrs_disable=1
net.isr.maxthreads=-1
net.isr.bindthreads = 1
net.isr.dispatch = deferred
net.inet.rss.enabled = 1
net.inet.rss.bits = 6
kern.ipc.maxsockbuf = 614400000
net.inet.tcp.recvbuf_max=4194304
net.inet.tcp.recvspace=65536
net.inet.tcp.sendbuf_inc=65536
net.inet.tcp.sendbuf_max=4194304
net.inet.tcp.sendspace=65536
net.inet.tcp.soreceive_stream = 1
net.pf.source_nodes_hashsize = 1048576
net.inet.tcp.mssdflt=1240
net.inet.tcp.abc_l_var=52
net.inet.tcp.minmss = 536
kern.random.fortuna.minpoolsize=128
net.isr.defaultqlimit=2048

If you want my sources and reasoning for the changes and how I arrived at them, I went into a lot of detail in my blog article.

Just wanted to add my 2 cents to this very useful thread, which did start me off in the right direction toward solving the issue for my setup.  Hopefully these details are helpful to someone else.

Cheers,
Kirk

@Kirk: How to set these tweaks? I can't find these options in the Web GUI, except of the first mentioned.

Quote from: Porfavor on November 22, 2022, 12:26:38 AM
@Kirk: How to set these tweaks? I can't find these options in the Web GUI, except of the first mentioned.

@Porfavor these settings are in System > Settings > Tunables.  Some of the tunables will not be listed on that page.  You can click the + icon to add the tunable you want to tweak.

For example once you hit + you would put a tunable like "net.inet.rss.enabled" in the tunable box, leave the description blank (it will autofill it with a description it has already from somewhere), and then copy the value, like 1, into the value box.

Keep in mind some of these tunables will not be applied until the system is rebooted.


I'm experiencing the same exact issue. I read that the blame was also being put towards the CPU. So, in order to test this I took my same appliance:

J4125
8GB RAM
Intel 225 (4 ports)

and installed untangled on it. Once I loaded everything to match my opnsense config my speeds were normal. In fact the impact of IPS was minimal. I went from 1.1gbps (no IPS, 800-900 with IPS) to 1.4gbps which is the speed I pay for from my provider.

This can't be a CPU issue. I'm going to try the tweaks above in OPNsense now and see if it makes any changes.

March 13, 2023, 02:49:08 AM #156 Last Edit: March 13, 2023, 02:56:33 AM by feld
OpnSense DEC840 which is supposed to be able to handle passing ~15gbit of traffic

Speedtest from the firewall:


# speedtest --server-id=47746

   Speedtest by Ookla

      Server: AT&T - Miami, FL (id: 47746)
         ISP: AT&T Internet
Idle Latency:     3.53 ms   (jitter: 0.50ms, low: 3.06ms, high: 4.12ms)
    Download:  2327.36 Mbps (data used: 2.6 GB)
                  5.18 ms   (jitter: 1.65ms, low: 2.79ms, high: 26.40ms)
      Upload:   378.54 Mbps (data used: 685.6 MB)
                  3.01 ms   (jitter: 1.79ms, low: 2.03ms, high: 55.43ms)
Packet Loss:     0.0%
  Result URL: https://www.speedtest.net/result/c/bbd0ee99-ad99-4e32-b3c9-ad05daf8bd84


Speedtest through the firewall (notice slow upload)


# speedtest --server-id=47746

   Speedtest by Ookla

      Server: AT&T - Miami, FL (id: 47746)
         ISP: AT&T Internet
Idle Latency:     4.17 ms   (jitter: 0.94ms, low: 3.06ms, high: 6.49ms)
    Download:  2295.81 Mbps (data used: 1.5 GB)
                  5.08 ms   (jitter: 2.15ms, low: 2.79ms, high: 53.90ms)
      Upload:   329.78 Mbps (data used: 362.9 MB)
                  4.05 ms   (jitter: 1.37ms, low: 3.12ms, high: 16.97ms)
Packet Loss:     0.0%
  Result URL: https://www.speedtest.net/result/c/2f29bb86-def6-4379-ad30-7292ad3e1926


iperf3 from the same machine *to* the Opnsense firewall, normal and reverse


root@dev:/ # iperf3 -c gw
Connecting to host gw, port 5201
[  5] local 10.27.3.230 port 31205 connected to 10.27.3.254 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   272 MBytes  2.28 Gbits/sec  413    472 KBytes
[  5]   1.00-2.00   sec   287 MBytes  2.41 Gbits/sec    2    614 KBytes
[  5]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec   61    593 KBytes
[  5]   3.00-4.00   sec   280 MBytes  2.35 Gbits/sec   23   17.0 KBytes
[  5]   4.00-5.00   sec   261 MBytes  2.19 Gbits/sec   82    257 KBytes
[  5]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec   14    133 KBytes
[  5]   6.00-7.00   sec   254 MBytes  2.13 Gbits/sec   20    737 KBytes
[  5]   7.00-8.00   sec   260 MBytes  2.18 Gbits/sec   70    512 KBytes
[  5]   8.00-9.00   sec   268 MBytes  2.25 Gbits/sec  140    737 KBytes
[  5]   9.00-10.00  sec   266 MBytes  2.23 Gbits/sec  116    714 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.60 GBytes  2.23 Gbits/sec  941             sender
[  5]   0.00-10.00  sec  2.60 GBytes  2.23 Gbits/sec                  receiver

iperf Done.
root@dev:/ # iperf3 -R -c gw
Connecting to host gw, port 5201
Reverse mode, remote host gw is sending
[  5] local 10.27.3.230 port 12997 connected to 10.27.3.254 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   254 MBytes  2.13 Gbits/sec
[  5]   1.00-2.02   sec   262 MBytes  2.16 Gbits/sec
[  5]   2.02-3.00   sec   257 MBytes  2.19 Gbits/sec
[  5]   3.00-4.00   sec   250 MBytes  2.10 Gbits/sec
[  5]   4.00-5.00   sec   234 MBytes  1.97 Gbits/sec
[  5]   5.00-6.00   sec   244 MBytes  2.05 Gbits/sec
[  5]   6.00-7.00   sec   251 MBytes  2.11 Gbits/sec
[  5]   7.00-8.00   sec   229 MBytes  1.92 Gbits/sec
[  5]   8.00-9.00   sec   248 MBytes  2.08 Gbits/sec
[  5]   9.00-10.00  sec   238 MBytes  1.99 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.41 GBytes  2.07 Gbits/sec   14             sender
[  5]   0.00-10.00  sec  2.41 GBytes  2.07 Gbits/sec                  receiver

iperf Done.


I actually expect more than this. With a loopback to my own server through my switch I can do 9gbit with a single stream. If I do multiple streams to the Opnsense firewall I can hit 4.2gbit max


So where is this mysterious bottleneck coming from? I did have the ipsec.ko loaded from an old setup, but I had no policies. Module completely gone. No amount of tuning or interface settings changes seems to matter.

How do I get this thing to actually push line rate? I've even swapped from 10gbase-t to fiber in case it was something odd with the media, but same results.

edit: I setup another test scenario where I do a speed test over wifi from my laptop to my server using Librespeed and when I hit it directly through my AP on the same switch connected to the server I can do 300/300, but when I force my traffic to go through the firewall (same segment, same VLAN) the download speed (my server's upload) can't break 100

There is something very peculiar going on

CPU at the test? Fragmentation? Iperf from A to B through the Firewall? Any drops at the switch? Services screenshot of ds
Dashboard please

Majority of the issue was net.isr.dispatch=direct which should be net.isr.dispatch=deferred so multiple CPU cores are used. I can hit ~7gbit on an iperf to the firewall and I've been able to get my full 2gbit through it.

I don't know why this isn't the default value in Opnsense. I understand why it's not in FreeBSD, but a networking appliance should be tuned out of the box for maximum networking performance. Hope to see this and more auto-tuning improvements in the future.

I also would have expected Opnsense to automatically recognize this hardware and apply specific tuning for it. It is one of their flagship products after all.

The inability to get a full 10gbit iperf to the firewall when the DEC840 spec sheet specifically states "14.4Gbps firewall throughput" and "Firewall Port to Port Throughput: 9Gbps" makes me wonder if the Opnsense team has ever actually hit those numbers with this hardware or if they're just advertising theoretical max?

First of all thanks for this tip.

I tested it. I am running OPNsense on APU2D2.

Test setup:
- RPI4B+
- Win10 PC
- InterVlan setup
- Both hosts in separate VLans


net.isr.dispatch set to direct

-P 10

[SUM]   0.00-10.00  sec   788 MBytes   661 Mbits/sec                  sender
[SUM]   0.00-10.00  sec   785 MBytes   659 Mbits/sec                  receiver



net.isr.dispatch set to deferred

-P 10

[SUM]   0.00-10.00  sec  1.02 GBytes   878 Mbits/sec                  sender
[SUM]   0.00-10.00  sec  1.02 GBytes   877 Mbits/sec                  receiver



net.isr.dispatch set to deferred - running for 300s 10 streams

-P 10

[SUM]   0.00-300.00 sec  31.4 GBytes   899 Mbits/sec                  sender
[SUM]   0.00-300.00 sec  31.4 GBytes   899 Mbits/sec                  receiver


So there is something definitely to it. I was not able to go up to 1G, now I can after changing the value to "deferred"
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD