Poor Throughput (Even On Same Network Segment)

Started by hax0rwax0r, August 25, 2020, 08:31:25 PM

Previous topic - Next topic
For virtualised environments it helps to look into SR-IOV.

Supermicro M11SDV-8C-LN4F with Intel X710-DA2 running Proxmox 7 with SR-IOV VFs configured for OPNsense LAN and WAN on separate SFP+ slots.

Running
iperf3 -c192.168.178.8 -R -P3 -t30
through the firewalls.

OPNsense 21.7.3_1 with Sensei
[SUM]   0.00-30.00  sec  10.5 GBytes  3.00 Gbits/sec  3117             sender
[SUM]   0.00-30.00  sec  10.5 GBytes  3.00 Gbits/sec                  receiver


OPNsense 21.7.3_1 without Sensei
[SUM]   0.00-30.00  sec  23.8 GBytes  6.82 Gbits/sec  514             sender
[SUM]   0.00-30.00  sec  23.8 GBytes  6.82 Gbits/sec                  receiver


Blindtest, Linux based firewall hardware:
[SUM]   0.00-30.00  sec  29.3 GBytes  8.40 Gbits/sec    0             sender
[SUM]   0.00-30.00  sec  29.3 GBytes  8.40 Gbits/sec                  receiver


@athurdent

Do you think SR-IOV also helps if host (virtualized env. platform) uses vSwitches ?
I work with ESXi hosts where a NIC goes directly to vSwitch and so the NIC seems not to be "sliced" for VM guests.

Thanks for the benchmarks btw.

T.

Quote from: testo_cz on September 28, 2021, 09:24:52 PM
@athurdent

Do you think SR-IOV also helps if host (virtualized env. platform) uses vSwitches ?
I work with ESXi hosts where a NIC goes directly to vSwitch and so the NIC seems not to be "sliced" for VM guests.

Thanks for the benchmarks btw.

T.

Hi, not sure about the ESXi implementation, they seem to have documentation on it though. https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-CC021803-30EA-444D-BCBE-618E0D836B9F.html
The card itself definitely has integrated switching capabilities. If I use a VLAN only on the card for 2 VMs to communicate (VLAN is not configured or allowed on the hardware switch the card is connected to), then I get around 18G throughput, which is done on the card internally.

Quote from: athurdent on September 29, 2021, 05:28:19 AM
Quote from: testo_cz on September 28, 2021, 09:24:52 PM
@athurdent

Do you think SR-IOV also helps if host (virtualized env. platform) uses vSwitches ?
I work with ESXi hosts where a NIC goes directly to vSwitch and so the NIC seems not to be "sliced" for VM guests.

Thanks for the benchmarks btw.

T.

Hi, not sure about the ESXi implementation, they seem to have documentation on it though. https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-CC021803-30EA-444D-BCBE-618E0D836B9F.html
The card itself definitely has integrated switching capabilities. If I use a VLAN only on the card for 2 VMs to communicate (VLAN is not configured or allowed on the hardware switch the card is connected to), then I get around 18G throughput, which is done on the card internally.

Thats an interesting information -- SR-IOV cards VFs just switch between each other. It also makes sense. I can imagine how this would improve smaller setups, no matter if its ESXi or another.

ESXi docs say that Direct I/O enables HW acceleration too, no matter vSwitch, but only some scenarios. I assume its an combination of their VMXNET3 paravirt. driver magic and Physical Fuction of the NIC. What I've seen its default for large ESXi setups.

18G means the traffic went through PCIe only , cool.

Thanks. T.

I made a new thread about this very same issue but with Proxmox guests in the mix:
https://forum.opnsense.org/index.php?topic=25410.msg122060#msg122060

I don't want to blame OPNsense 100% before I rule out OVS problems, but OVS has not had issues for me in the past :(

I'm chiming in to say I have seen similar issues. Running on proxmox, I can only route about 600 mbps in opnsense using virtio/vtnet. A related kernel process in opnsense shows 100% cpu usage and the underlying vhost process on the proxmox host is pegged as well.

Trying a Linux VM on the same segment (i.e. not routing the opnsense) saturates my 1gig nic on my desktop with only 25% cpu usage on the associated vhost process for the VMs nic.

I know some blame has been put on CPU speed/etc., but I think there is some sort of performance issue with the vtnet drivers. Even users of pfsense have had similar complaints. I also tried the new opnsense development build (freebsd 13) with no improvement.

I passed my nic through to the opnsense VM and reconfigured the interfaces and can route 1gbps no sweat. This is with the em driver (which supports my nic).

Note: I can get 1gbps with multiple queues set on the vtnet adapters for the opnsense VM. However, this still doesn't fix the performance issue with a single "stream."

Hello,

I'm joining this thread too .. we have:

* 4 x DEC-3850
* OPNsense 21.10.2-amd64 (Business edition)

Since we use OpnSense .. we have the problem with throughput .. we had in the beginning a SuperMicro X11-SSH with ~5Gb/s and switched than to the appliance. We never reach more than 2-3Gb/s (iperf3, without any special options) and it seems .. the problem is the VPN stack. So, if you have a IPSec tunnel, all traffic slows down, even it does not go through the tunnel.

we tested:

* VM -> VM same hypervisor (Proxmox) same VLAN = ~16Gb/s
* VM -> VM different  hypervisor (Proxmox) same VLAN = ~10Gb/s
* VM -> VM different  hypervisor (Proxmox) different VLAN  = 1,5Gb/s -  ~3Gb/s

So, if it goes via OpnSense .. the network slows down.

https://www.mayrhofer.eu.org/post/firewall-throughput-opnsense-openwrt/

QuoteWhen IPsec is active - even if the relevant traffic is not part of the IPsec policy - throughput is decreased by nearly 1/3. This seems like a real performance issue / bug in the FreeBSD/HardenedBSD kernel. I will need to try with VTI based IPsec routing to see if the in-kernel policy matching is a problem.

What makes as very sad .. if this is the real issue ..  It is not easy, to test it and disable VPN .. but we will try to build a test scenario ...

Pretty sad things ...


@linuxmail would you mind stopping random cross-posting, thanks

Is there a way how I could test this with a bare metal opnsense installation? How would I proceed here?

April 27, 2022, 04:20:42 PM #145 Last Edit: April 27, 2022, 04:46:00 PM by marcosscriven
EDIT - Resolved - see next post

Original post:

Quote from: iamperson347 on December 05, 2021, 07:48:25 PM
I'm chiming in to say I have seen similar issues. Running on proxmox, I can only route about 600 mbps in opnsense using virtio/vtnet. A related kernel process in opnsense shows 100% cpu usage and the underlying vhost process on the proxmox host is pegged as well.

I'm seeing throughput all over the place on a similar setup (I.e in a Proxmox VM)


[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  97.0 MBytes   814 Mbits/sec
[  5]   1.00-2.00   sec   109 MBytes   911 Mbits/sec
[  5]   2.00-3.00   sec   111 MBytes   934 Mbits/sec
[  5]   3.00-4.00   sec   103 MBytes   867 Mbits/sec
[  5]   4.00-5.00   sec   100 MBytes   843 Mbits/sec
[  5]   5.00-6.00   sec   112 MBytes   937 Mbits/sec
[  5]   6.00-7.00   sec   109 MBytes   911 Mbits/sec
[  5]   7.00-8.00   sec  75.7 MBytes   635 Mbits/sec
[  5]   8.00-9.00   sec  68.9 MBytes   578 Mbits/sec
[  5]   9.00-10.00  sec  96.6 MBytes   810 Mbits/sec
[  5]  10.00-11.00  sec   112 MBytes   936 Mbits/sec


And while that's happening, I see the virtio_pci process maxing out:


  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   12 root        -92    -     0B   400K CPU0     0  21:42  94.37% [intr{irq29: virtio_pci1}]
51666 root          4    0    17M  6600K RUN      1   0:18  68.65% iperf3 -s
   11 root        155 ki31     0B    32K RUN      1  20.4H  13.40% [idle{idle: cpu1}]
   11 root        155 ki31     0B    32K RUN      0  20.5H   3.61% [idle{idle: cpu0}]


Are there any settings that could help with this please?

I'm on 22.1.6

April 27, 2022, 04:50:08 PM #146 Last Edit: April 27, 2022, 05:18:23 PM by marcosscriven
Further to my previous post, I actually fixed this just by turning on all the hardware acceleration options in "Interface -> Settings"

That includes CRC, TSO, and LRO. I removed the 'disabled' check and rebooted.

Now get rock solid iperf3 result:


[  5] 166.00-167.00 sec   112 MBytes   941 Mbits/sec
[  5] 167.00-168.00 sec   112 MBytes   941 Mbits/sec
[  5] 168.00-169.00 sec   112 MBytes   941 Mbits/sec
[  5] 169.00-170.00 sec   112 MBytes   941 Mbits/sec
[\code]

And NIC processing load dropped to just 25% or so:

[code]
  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31     0B    32K RUN      1   3:14  77.39% [idle{idle: cpu1}]
   11 root        155 ki31     0B    32K RUN      0   3:06  71.26% [idle{idle: cpu0}]
   12 root        -92    -     0B   400K WAIT     0   0:55  28.35% [intr{irq29: virtio_pci1}]
91430 root          4    0    17M  6008K RUN      0   0:43  21.94% iperf3 -s


What confused me was:

1) The acceleration is disabled by default (not sure why?)
2) I thought it would apply to virtio devices, but clearly they're implementing the right things to support it.

EDIT

Arghh - perhaps not. While this fixed the LAN side, suddenly the WAN side throughput plummets.

This is strange because it's using the same virtio to a separate NIC of exactly the same type.

we've also a performance issue, we've a Scop7 5510 with 10G SFP+ and just got 1,2GBit/s but that should be >9GBit.

Any ideas why this happens and how to fix that?

Quote from: linuxmail on February 02, 2022, 12:54:49 PM
https://www.mayrhofer.eu.org/post/firewall-throughput-opnsense-openwrt/

QuoteWhen IPsec is active - even if the relevant traffic is not part of the IPsec policy - throughput is decreased by nearly 1/3. This seems like a real performance issue / bug in the FreeBSD/HardenedBSD kernel. I will need to try with VTI based IPsec routing to see if the in-kernel policy matching is a problem.

Well spotted! Exactly the same negative observation here on my end with IPsec policy based VPN. 

Here is a first estimate of how IPsec affects my routing speed in the LAN:





Direction                                     IPsec enabled    IPsec disabled
Server -> OPnsense -> Client 48.1 MB/s74.2 MB/s
Server <- OPnsense <- Client 49.9 MB/s61.1 MB/s

Overall, the routing speed remains very disappointing. Especially considering I had full routing performance up until OPNsense 20.1.

During my testing, I noticed that OPNsense doesn't seem to be utilizing all NIC queues. Two out of four NIC queues process almost no traffic and are bored.

dev.ix.2.queue3.rx_packets: 2959840
dev.ix.2.queue2.rx_packets: 2158082
dev.ix.2.queue1.rx_packets: 9861
dev.ix.2.queue0.rx_packets: 4387

dev.ix.2.queue3.tx_packets: 2967255
dev.ix.2.queue2.tx_packets: 2160888
dev.ix.2.queue1.tx_packets: 15955
dev.ix.2.queue0.tx_packets: 8725


Any take on this?








interrupt              total              rate              
irq51: ix2:rxq0513611
irq52: ix2:rxq121764744708
irq53: ix2:rxq2720316
irq54: ix2:rxq332994717138
irq55: ix2:aq10

This is really crap!