[Tutorial/Call for Testing] Enabling Receive Side Scaling on OPNsense

Started by tuto2, August 16, 2021, 02:13:24 PM

Previous topic - Next topic
May I ask what command is being executed to show the unsupported? when I use sysctl -a I dont see anything reported as unsupported.

  Well, donno why I had to execute restart from the Web, I updated and saved the tunables and applied. Nothing happened so, I rebooted from the shell. that didn't work so I rebooted again which now works using the Web management.


└─[$]> sudo sysctl -a | grep -i "net.isr\|net.rss"
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 8:0 9:1 10:2 11:3 12:4 13:5 14:6 15:7
net.inet.rss.enabled: 1
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 16
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 8
net.inet.rss.maxbits: 7
net.inet.rss.mask: 15
net.inet.rss.bits: 4
net.inet.rss.hashalgo: 2
net.isr.numthreads: 8
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 1
net.isr.maxthreads: 8
net.isr.dispatch: direct

└─[$]> netstat -Q
Configuration:
Setting                        Current        Limit
Thread count                         8            8
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs          enabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   1000    cpu   hybrid   C--
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256    cpu   direct   C--
ip6        6   1000    cpu   hybrid   C--
ip_direct     9    256    cpu   hybrid   C--
ip6_direct    10    256    cpu   hybrid   C--


AND WOW!!! I am 7F cooler when using RSS!  8)

Quote from: zz00mm on March 04, 2022, 12:16:23 AM
May I ask what command is being executed to show the unsupported? when I use sysctl -a I dont see anything reported as unsupported.

I'm also interested in knowing this.  :)
Intel i7-8550U - Intel I211 - RAM 16GB - NVMe 120Gb
Intel i7-5550U - Intel I211 - RAM 8GB - NVMe 50Gb

Quote from: fadern on March 13, 2022, 09:04:20 PM
Quote from: zz00mm on March 04, 2022, 12:16:23 AM
May I ask what command is being executed to show the unsupported? when I use sysctl -a I dont see anything reported as unsupported.

I'm also interested in knowing this.  :)

What's not returned by "sysctl -a" is marked as unsupported. It's a relatively safe bet. Except for some edge case hidden stuff in boot loaders for likely historic reasons.


Cheers,
Franco

The current Suricata/Netmap implementation limits this re-injection to one thread only.
Work is underway to address this issue since the new Netmap API (V14+) is now capable of increasing this thread count.
Until then, no benefit is gained from RSS when using IPS.


Any news on this?  This plus RSS on lower power multi-cored devices sounds interesting.

The development release type has the suricata/netmap changes, but due to stability concerns it has been parked there.

We've made tests with Suricata 5 and the Netmap v14 API and it seems to perform better in general which would indicate there is at least 1 issue still in Suricata 6 that makes it unstable whether or not v14 API (development release) is used or not (community release).


Cheers,
Franco

Hi,

I've been trying to get this working on ESX and a VM OPNsense with 8 vcpu. For some reason if I run an iperf with single tcp stream I only get about 600Mbit/s and watching on OPNsense with top -P I can see 1 of the 8 cores going to 0% idle. Using the VMXNET3 adapter.
root@OPNsense:~ # netstat -Q
Configuration:
Setting                        Current        Limit
Thread count                         8            8
Default queue limit                256        10240
Dispatch policy                 hybrid          n/a
Threads bound to CPUs          enabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   1000    cpu   hybrid   C--
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256    cpu   direct   C--
ip6        6   1000    cpu   hybrid   C--
ip_direct     9    256    cpu   hybrid   C--
ip6_direct    10    256    cpu   hybrid   C--

Workstreams:
WSID CPU   Name     Len WMark   Disp'd  HDisp'd   QDrops   Queued  Handled
   0   0   ip         0     0        0  1097010        0        0  1097010
   0   0   igmp       0     0        0        0        0        0        0
   0   0   rtsock     0     0        0        0        0        0        0
   0   0   arp        0     0        0        0        0        0        0
   0   0   ether      0     0  1173731        0        0        0  1173731
   0   0   ip6        0     0        0    63446        0        0    63446
   0   0   ip_direct     0     0        0        0        0        0        0
   0   0   ip6_direct     0     0        0        0        0        0        0
   1   1   ip         0    34        0   475830        0       38   475868
   1   1   igmp       0     0        0        0        0        0        0
   1   1   rtsock     0     0        0        0        0        0        0
   1   1   arp        0     1        0        0        0    12712    12712
   1   1   ether      0     0   539495        0        0        0   539495
   1   1   ip6        0     2        0    63626        0     4216    67842
   1   1   ip_direct     0     0        0        0        0        0        0
   1   1   ip6_direct     0     0        0        0        0        0        0
   2   2   ip         0     0        0   412891        0        0   412891
   2   2   igmp       0     0        0        0        0        0        0
   2   2   rtsock     0     0        0        0        0        0        0
   2   2   arp        0     1        0        0        0      510      510
   2   2   ether      0     0   420304        0        0        0   420304
   2   2   ip6        0     1        0     7412        0        1     7413
   2   2   ip_direct     0     0        0        0        0        0        0
   2   2   ip6_direct     0     0        0        0        0        0        0
   3   3   ip         0     0        0   653430        0        0   653430
   3   3   igmp       0     0        0        0        0        0        0
   3   3   rtsock     0     0        0        0        0        0        0
   3   3   arp        0     1        0        0        0       53       53
   3   3   ether      0     0   676969        0        0        0   676969
   3   3   ip6        0     0        0    23539        0        0    23539
   3   3   ip_direct     0     0        0        0        0        0        0
   3   3   ip6_direct     0     0        0        0        0        0        0
   4   4   ip         0    23        0   354980        0    11847   366827
   4   4   igmp       0     0        0        0        0        0        0
   4   4   rtsock     0     0        0        0        0        0        0
   4   4   arp        0     0        0        0        0        0        0
   4   4   ether      0     0   358176        0        0        0   358176
   4   4   ip6        0     1        0     3074        0        1     3075
   4   4   ip_direct     0     0        0        0        0        0        0
   4   4   ip6_direct     0     0        0        0        0        0        0
   5   5   ip         0     1        0   855737        0        2   855739
   5   5   igmp       0     0        0        0        0        0        0
   5   5   rtsock     0     3        0        0        0     4717     4717
   5   5   arp        0     0        0        0        0        0        0
   5   5   ether      0     0   859020        0        0        0   859020
   5   5   ip6        0     1        0     3281        0      159     3440
   5   5   ip_direct     0     0        0        0        0        0        0
   5   5   ip6_direct     0     0        0        0        0        0        0
   6   6   ip         0     0        0  1513336        0        0  1513336
   6   6   igmp       0     0        0        0        0        0        0
   6   6   rtsock     0     0        0        0        0        0        0
   6   6   arp        0     0        0        0        0        0        0
   6   6   ether      0     0  1517246        0        0        0  1517246
   6   6   ip6        0     1        0     3910        0        1     3911
   6   6   ip_direct     0     0        0        0        0        0        0
   6   6   ip6_direct     0     0        0        0        0        0        0
   7   7   ip         0     0        0   335859        0        0   335859
   7   7   igmp       0     0        0        0        0        0        0
   7   7   rtsock     0     0        0        0        0        0        0
   7   7   arp        0     0        0        0        0        0        0
   7   7   ether      0     0   341939        0        0        0   341939
   7   7   ip6        0     0        0     6080        0        0     6080
   7   7   ip_direct     0     0        0        0        0        0        0
   7   7   ip6_direct     0     0        0        0        0        0        0
root@OPNsense:~ # vmstat -i
interrupt                          total       rate
irq1: atkbd0                           2          0
irq17: mpt0                       314344          5
irq18: uhci0                      110225          2
cpu0:timer                       1335718         21
cpu1:timer                        569148          9
cpu2:timer                        597637          9
cpu3:timer                        592890          9
cpu4:timer                        590653          9
cpu5:timer                        593208          9
cpu6:timer                        592515          9
cpu7:timer                        609112         10
irq24: ahci0                       41838          1
irq26: vmx0:rxq0                  188954          3
irq27: vmx0:rxq1                  138552          2
irq28: vmx0:rxq2                   71792          1
irq29: vmx0:rxq3                  162662          3
irq30: vmx0:rxq4                  109552          2
irq31: vmx0:rxq5                  166029          3
irq32: vmx0:rxq6                  317057          5
irq33: vmx0:rxq7                   63136          1
irq43: vmx1:rxq0                    1759          0
irq44: vmx1:rxq1                    2393          0
irq45: vmx1:rxq2                    4260          0
irq46: vmx1:rxq3                     557          0
irq47: vmx1:rxq4                    1137          0
irq48: vmx1:rxq5                    3461          0
irq49: vmx1:rxq6                    4689          0
irq50: vmx1:rxq7                    1468          0
irq60: vmx2:rxq0                   73391          1
irq61: vmx2:rxq1                  153881          2
irq62: vmx2:rxq2                   54965          1
irq63: vmx2:rxq3                   75044          1
irq64: vmx2:rxq4                   98827          2
irq65: vmx2:rxq5                  277362          4
irq66: vmx2:rxq6                   63113          1
irq67: vmx2:rxq7                   69899          1
Total                            8051230        127


My current settings, ive played a lot with these but havent gotten anything above 800Mbits/sec:

vmx0: Using MSI-X interrupts with 9 vectors
vmx1: Using MSI-X interrupts with 9 vectors
vmx2: Using MSI-X interrupts with 9 vectors
vmx0: Using MSI-X interrupts with 9 vectors
vmx1: Using MSI-X interrupts with 9 vectors
vmx2: Using MSI-X interrupts with 9 vectors
vmx0: Using MSI-X interrupts with 9 vectors
vmx1: Using MSI-X interrupts with 9 vectors
vmx2: Using MSI-X interrupts with 9 vectors
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
net.inet.rss.enabled: 1
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 8
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 8
net.inet.rss.maxbits: 7
net.inet.rss.mask: 7
net.inet.rss.bits: 3
net.inet.rss.hashalgo: 2
net.isr.numthreads: 8
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 1
net.isr.maxthreads: 8
net.isr.dispatch: hybrid
hw.vmd.max_msix: 3
hw.vmd.max_msi: 1
hw.sdhci.enable_msi: 1
hw.puc.msi_disable: 0
hw.pci.honor_msi_blacklist: 1
hw.pci.msix_rewrite_table: 0
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1
hw.mfi.msi: 1
hw.malo.pci.msi_disable: 0
hw.ix.enable_msix: 1
hw.bce.msi_enable: 1
hw.aac.enable_msi: 1
machdep.disable_msix_migration: 0
machdep.num_msi_irqs: 2048
machdep.first_msi_irq: 24
dev.vmx.2.iflib.disable_msix: 0
dev.vmx.1.iflib.disable_msix: 0
dev.vmx.0.iflib.disable_msix: 0
iperf3 -c 10.0.0.4  -p 4999 -P 1
Connecting to host 10.0.0.4, port 4999
[  5] local 192.168.2.100 port 57384 connected to 10.0.0.4 port 4999
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  72.6 MBytes   609 Mbits/sec   20    618 KBytes       
[  5]   1.00-2.00   sec  83.8 MBytes   703 Mbits/sec    0    714 KBytes       
[  5]   2.00-3.00   sec  83.8 MBytes   703 Mbits/sec    9    584 KBytes       
[  5]   3.00-4.00   sec  83.8 MBytes   703 Mbits/sec    0    684 KBytes       
[  5]   4.00-5.00   sec  83.8 MBytes   703 Mbits/sec    1    556 KBytes       
[  5]   5.00-6.00   sec  80.0 MBytes   671 Mbits/sec    0    659 KBytes       
[  5]   6.00-7.00   sec  85.0 MBytes   713 Mbits/sec    1    525 KBytes       
[  5]   7.00-8.00   sec  83.8 MBytes   703 Mbits/sec    0    636 KBytes       
[  5]   8.00-9.00   sec  83.8 MBytes   703 Mbits/sec    0    732 KBytes       
[  5]   9.00-10.00  sec  85.0 MBytes   713 Mbits/sec    5    611 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   825 MBytes   692 Mbits/sec   36             sender
[  5]   0.00-10.00  sec   821 MBytes   689 Mbits/sec                  receiver


I have also played with VM advanced settings on ESXi side, tried these two which I found from Paloalto site:
ethernet1.pnicFeatures = "4"
ethernet2.pnicFeatures = "4"
ethernet3.pnicFeatures = "4"
ethernet1.ctxPerDev = "1"
ethernet2.ctxPerDev = "1"
ethernet3.ctxPerDev = "1"


But no help, driver inside ESXi is:
esxcli network nic get -n vmnic0
   Advertised Auto Negotiation: true
   Advertised Link Modes: Auto, 100BaseT/Full, 1000BaseT/Full, 10000BaseT/Full
   Auto Negotiation: true
   Cable Type: Twisted Pair
   Current Message Level: 0
   Driver Info:
         Bus Info: 0000:03:00:0
         Driver: ixgben
         Firmware Version: 0x8000038e
         Version: 1.8.7
   Link Detected: true
   Link Status: Up
   Name: vmnic0
   PHYAddress: 0
   Pause Autonegotiate: false
   Pause RX: false
   Pause TX: false
   Supported Ports: TP
   Supports Auto Negotiation: true
   Supports Pause: true
   Supports Wakeon: true
   Transceiver:
   Virtual Address: 00:50:56:50:0d:98
   Wakeon: MagicPacket(tm)

esxcli system module parameters list -m ixgben
Name     Type          Value  Description
-------  ------------  -----  --------------------------------------------------------------------------------------------------------------------------------
DRSS     array of int         DefQueue RSS state: 0 = disable, 1 = enable (default = 0; 4 queues if DRSS is enabled)
DevRSS   array of int         Device RSS state: 0 = disable, 1 = enable (default = 0; 16 queues but all virtualization features disabled if DevRSS is enabled)
QPair    array of int         Pair Rx & Tx Queue Interrupt: 0 = disable, 1 = enable (default)
RSS      array of int  1,1    NetQueue RSS state: 0 = disable, 1 = enable (default = 1; 4 queues if RSS is enabled)
RxITR    array of int         Default RX interrupt interval: 0 = disable, 1 = dynamic throttling, 2-1000 in microseconds (default = 50)
TxITR    array of int         Default TX interrupt interval: 0 = disable, 1 = dynamic throttling, 2-1000 in microseconds (default = 100)
VMDQ     array of int         Number of Virtual Machine Device Queues: 0/1 = disable, 2-16 enable (default = 8)
max_vfs  array of int         Maximum number of VFs to be enabled (0..63)



Ive been running a paloalto firewall also and I dont seem to have these problems with that. Wanting to switch from Paloalto to OPNsense but want to figure out this problem first. Any ideas what to try next? OPNsense version is 22.1.3-amd64


Do you have any good news concerning this topic?
Is it stable now?
Maybe in 22.7?

Quote from: franco on March 21, 2022, 07:54:04 AM
The development release type has the suricata/netmap changes, but due to stability concerns it has been parked there.

We've made tests with Suricata 5 and the Netmap v14 API and it seems to perform better in general which would indicate there is at least 1 issue still in Suricata 6 that makes it unstable whether or not v14 API (development release) is used or not (community release).


Cheers,
Franco
i want all services to run with wirespeed and therefore run this dedicated hardware configuration:

AMD Ryzen 7 9700x
ASUS Pro B650M-CT-CSM
64GB DDR5 ECC (2x KSM56E46BD8KM-32HA)
Intel XL710-BM1
Intel i350-T4
2x SSD with ZFS mirror
PiKVM for remote maintenance

private user, no business use

It's still not perfect from our end so it's locked away in the development release (System: Firmware: Settings: release type).


Cheers,
Franco

Hi,

I tried enabling RSS and Suricata works. Better spread of CPU load and better performance. However, haproxy runs into issues. HAProxy can't connect to anything, not for health checks and not for live traffic. Based on earlier comment on so_reuseport, I changed my config to simple binds and enabled noreuseport for haproxy, but haproxy still fails to connect.

It gets very sporadic, ~10%, successes but that's rare enough for a health check not to clear. Since I have 8 RSS queues it is almost like haproxy only gets traffic from 1 queue which would amount to 12.5% success.

I've tried all combos of net.inet.rss.enable, noreuseport, with health checks, w/o health checks and success/failure depends completely on net.inet.rss.enable. The error reported from haproxy is "Layer4 timeout"

driver: ix
NIC: Intel D-1500 soc 10 gbe, (X552)
Opnsense: 22.1.7_1

I more than happy to help testing but would appreciate any suggestions in what direction to start.

Having done some more reasearch and Google-foo. I don't have a solution, but at least some more insight.

It seems BSD doesn't add RSS support for outgoing connections and that causes issues for HAProxy. I did find others having experienced the same issue with proposed patches for HAProxy but unfortunately those patches weren't accepted:
https://www.mail-archive.com/haproxy@formilux.org/msg34548.html
https://lists.freebsd.org/pipermail/freebsd-transport/2019-June/000247.html

Both links have the same author. His thinking sounds sound, and his patches work for him, but my own 5 cents says that they rely on using a symmetric Toeplitz key even though he doesn't mention that. If my hunch is right, then the default Microsoft key doesn't work well. A symmetric key is also required for Suricata to ensure each thread sees both in and out flows of the same conversation. What he does is using the same RSS hash to assign outgoing ports to the same CPU and RSS queue as incoming traffic would hit, thereby not only assuring connection, also preventing CPU context switching even in HAProxy.

I tried limiting HAProxy to 1 process and 1 thread hoping that could work as a very quick, but performance limited, fix, but unfortunately not.

You could argue that solving this within HAProxy is not the right place as it intertwines the layers, but HAProxy RSS awereness also adds the prevention of CPU context switches between net.inet and HAProxy. His patches also hard codes the default hash key. Instead he should have asked the kernel for the key (I don't know if that's possible today) to make sure the same key is used. To satisfy Suricatas requirement, a wish would be for the possibility of setting a symmetric key.

We also did some tests on enabling RSS on about 15 KVM/Proxmox opnsense installations.
We used latest "Business-Edition" of opnsense Build for this.

It seems there are also still issues with nginx ReverseProxy which is also unable to connect to the backend servers when RSS is enabled.

Disabling RSS and reboot and nginx is working flawlessly again....


Tried out the RSS feature for a few days on our OPNsense instance (23.1.9-amd64) and while the performance is great, we've run into some weird issues with internal system services not being able to resolve DNS requests. Disabling RSS seems to fix the issue. We are using the Intel ice driver with a 25GbE SOC based NIC. Machine in question: https://www.supermicro.com/en/products/system/iot/1u/sys-110d-20c-fran8tp

Specifically, pkg update and the GeoIP feature in the firewall cannot connect, and I tracked this down to a DNS issue. When running pkg update via CLI or the GUI it hangs on the fetching process, and when running it with debug you can see its stuck at the resolving DNS stage. The GeoIP feature has similar issues.

Interestingly, if you use ping from CLI or use the DNS diagnostic tool, the system resolves DNS requests totally fine. I enabled debug on Unbound and it doesn't appear to even receive the requests from pkg update or GeoIP downloads. Would love to get this fixed so we can use RSS since it handles our 10G symmetrical connection a lot better.

user@kappa:/usr/local/etc # pkg -ddddddd update
DBG(1)[34160]> pkg initialized
Updating OPNsense repository catalogue...
DBG(1)[34160]> PkgRepo: verifying update for OPNsense
DBG(1)[34160]> PkgRepo: need forced update of OPNsense
DBG(1)[34160]> Pkgrepo, begin update of '/var/db/pkg/repo-OPNsense.sqlite'
DBG(1)[34160]> Request to fetch pkg+https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/meta.conf
DBG(1)[34160]> opening libfetch fetcher
DBG(1)[34160]> Fetch > libfetch: connecting
DBG(1)[34160]> Fetch: fetching from: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/meta.conf with opts "iv"
resolving server address: pkg.opnsense.org:443

^ hangs here for a while before retrying and effectively goes no where.

I can't reproduce a hang with a standard igb(4) here so it might be specific to ice(4). I'll try to monitor from my main install to see if this happens eventually or not.


Cheers,
Franco