Hi all,
In a future 21.7.x OPNsense release, in-kernel support for Receive Side Scaling will be included. The implementation of RSS is coupled with PCBGROUP – an implementation which introduces notions of CPU affinity for connections. While the latter will be of lesser importance for OPNsense, since it specifically applies to connections built up in userland using sockets (which is relevant to servers, not middleboxes), the idea of distributing work on a lower level with hardware support provides a myriad of benefits – especially with regard to multithreading in Suricata in our use case (please see the note at the end of this post).
Without going into too much technical detail, I'll provide a short description of the inner workings of RSS – as well as how to set up the correct tunables to ensure steady state operation. All of this will hopefully also serve as a troubleshooting guide.
OverviewRSS is used to distribute packets over CPU cores using a hashing function – either with support in the hardware which offloads the hashing for you, or in software. The idea is to take as input the TCP 4-tuple (source address, source port, destination address, destination port) of a packet, hash this input using an in-kernel defined key, and selecting the resulting values' LSB as an index into a user-configurable indirection table. The indirection table is loaded into the hardware during boot and is used by the NIC to decide which CPU to interrupt with a given packet. All of this allows packets of the same origin/destination (a.k.a. flows) to be queued consistently on the same CPU.
By default, RSS will be disabled since it's impact is quite far reaching. Only enable this feature if you're interested in testing it and seeing if it will increase your throughput under high load – such as when using IDS/IPS. Since I do not have every type of hardware available to me – nor the time to test all of them, no guarantee is given that a NIC driver will properly handle the kernel implementation or is even capable of using it.
The NIC/DriverAssuming you are using a modern NIC which supports multiple hardware queues and RSS, the configuration of a NIC will decide how and on which queue packets arrive on your system. This is also hardware dependent and will not be the same on every NIC. Should your driver support the option to enable/disable RSS, a sysctl tunable will be available. One can search for one using
sysctl -a | grep rss
or (assuming you are i.e. using the axgbe driver)
sysctl dev.ax | grep rss
Sticking with the axgbe example, rss can be enabled by setting
dev.ax.0.rss_enabled = 1
dev.ax.1.rss_enabled = 1
in the OPNsense System->Settings->Tunables interface.
It is also possible that a driver does not expose this ability to the user, in which case you'd want to look up whether the NIC/driver supports RSS at all – using online datasheets or a simple google search. For example, igb (https://www.freebsd.org/cgi/man.cgi?query=igb&sektion=4&manpath=FreeBSD+7.4-RELEASE) enables RSS by default, dut does not reflect this in any configuration parameter. However, since it uses multiple queues:
dmesg | grep vectors
igb0: Using MSI-X interrupts with 5 vectors
igb1: Using MSI-X interrupts with 5 vectors
igb2: Using MSI-X interrupts with 5 vectors
igb3: Using MSI-X interrupts with 5 vectors
It will most likely have some form of packet filtering to distribute packets over the hardware queues. In fact, igb does RSS by default.
For most NICs, RSS is the primary method of deciding which CPU to interrupt with a packet. NICs that do not implement any other type of filter and whose RSS feature is missing or turned off, will most likely interrupt only CPU 0 at all times – which will reduce potential throughput due to cache line migrations and lock contention. Please keep system-wide RSS disabled if this is the case.
The last but not least thing to consider is the fact that driver support with the in-kernel implementation of RSS is a must. Proper driver support will ensure the correct key and indirection table being set in hardware. Drivers which support RSS according to the source code (but mostly untested):
- em
- igb -> tested & working
- axgbe -> tested & working
- netvsc
- ixgbe
- ixl
- cxgbe
- lio
- mlx5
- sfxge
The KernelInternally, FreeBSD uses netisr (https://www.freebsd.org/cgi/man.cgi?format=html&query=netisr(9)) as an abstraction layer for dispatching packets to the upper protocols. Within the implementation, the default setting is to restrict packet processing to one thread only. Since RSS now provides a way to keep flows local to a CPU, the following sysctls should be set in System->Settings->Tunables:
net.isr.bindthreads = 1
causes threads to be bound to a CPU
net.isr.maxthreads = -1
assigns a workstream to each CPU core available.
Furthermore, the RSS implementation also provides a few necessary sysctls:
net.inet.rss.enabled = 1
makes sure RSS is enabled. Disabled by default to prevent regressions on NICs that do not properly implement the RSS interface.
net.inet.rss.bits = X
This one is dependent on the amount of cores you have. By default the amount of bits here represent the amount of cores x 2 in binary. This is done on purpose to provide load-balancing, though there is no current implementation for this so I recommend setting this value to the amount of bits representing the number of CPU cores. This means we use the following values:
- for 4-core systems, use '2'
- for 8-core systems, use '3'
- for 16-core systems, use '4'
Etc.
If RSS is enabled with the 'enabled' sysctl, the packet dispatching policy will move from 'direct' to 'hybrid'. This will directly dispatch a packet on the current context when allowed, otherwise it will queue the packet on the bound CPU on which it came in on. Please note that this will increase the interrupt load as seen in 'top -P'. This simply means that packets are being processed with the highest priority in the CPU scheduler - it does not mean the CPU is under more load than normal.
The correct working of netisr can be verified by running
netstat -Q
Note regarding IPSWhen Suricata is running in IPS mode, Netmap is utilized to fetch packets off the line for inspection. By default, OPNsense has configured Suricata in such a way that the packet which has passed inspection will be re-injected into the host networking stack for routing/firewalling purposes. The current Suricata/Netmap implementation limits this re-injection to one thread only. Work is underway to address this issue since the new Netmap API (V14+) is now capable of increasing this thread count. Until then, no benefit is gained from RSS when using IPS.
Preliminary testingIf you'd like to test RSS on your system before the release, a pre-made kernel is available from the OPNsense pkg repository. Please set the tunables as described in this post and update using:
opnsense-update -zkr 21.7.1-rss
If you are doing performance tests, make sure to
disable rx/tx flow control if the NIC in question supports disabling this.
Feedback or questions regarding the use of RSS can be posted in this thread. Let me know your thoughts and whether you encounter any issues :)
Updateplease note and assume that all tunables set in this tutorial require a reboot to properly apply them.
Cheers,
Stephan
I can probably do some testing.
Can I understand what net rss bits to be using? I have 2 threads per CPU and 4 cores. Do I use a value of 2 or 4 for this?
root@OPNsense:~ # lscpu
Architecture: amd64
Byte Order: Little Endian
Total CPU(s): 4
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Vendor: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
Stepping: 9
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3M
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cflsh ds acpi mmx fxsr sse sse2 ss htt tm pbe sse3 pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline aes xsave osxsave avx f16c rdrnd fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid fpcsds mpx rdseed adx smap clflushopt intel_pt syscall nx pdpe1gb rdtscp lm lahf_lm lzcnt
Ideally, I'd like to see some testing with both hyperthreading enabled and disabled. Systems with hyperthreading usually have one hardware queue per logical CPU - as such only half the cores in your systems can be used for interrupts.
In your case please try the value '2' if only 4 hardware queues are used, otherwise use '3' if 8 hardware queues are used.
To expand on this: the 'bits' refer to the actual binary representation of powers of 2, e.g.:
(net.inet.rss.bits = 2) == 0b0011 = 3 (core 0 - 3, thus 4 cores)
0b0111 = 7
0b1111 = 15
Cheers,
Stephan
Enabled it on the opnsense built-in re (realtek) driver with my rtl8125b.
Seems to be in use and is working just fine, i guess?
root@rauter:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy deferred n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 24 0 6402 0 299134 305536
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 2 0 0 0 100 100
0 0 arp 0 0 0 0 0 0 0
0 0 ether 0 0 21891 0 0 0 21891
0 0 ip6 0 2 0 3 0 272 275
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 10 0 222075 0 123441 345516
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 1 0 0 0 1 1
1 1 ether 0 0 674658 0 0 0 674658
1 1 ip6 0 4 0 30 0 327 357
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 14 0 79091 0 108867 187958
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 1 0 0 0 105 105
2 2 ether 0 0 420575 0 0 0 420575
2 2 ip6 0 1 0 204 0 36 240
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 1 13 0 5750 0 301312 307061
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 0 0 0 0 0
3 3 ether 0 0 25502 0 0 0 25502
3 3 ip6 0 3 0 7 0 283 290
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
Thanks for the nice explanation, tuto2!
This sounds cool, I can surely test this.
I have ixl nics for LAN and WAN (passed through to the OPNsense VM in Proxmox, recognized as Intel(R) Ethernet Controller X710 for 10GbE SFP+), connected to a 10G switch and an old i3 2 core / 4 threads CPU (Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz)
Over in the Sensei forum, mb mentioned that Sensei would also benefit from RSS when it comes to reaching 10G speeds.
As I am not using Suricata on the ixl interfaces, but I am using Sensei on LAN, will it also benefit?
root@OPNsense:~ # sysctl -a | grep rss
hw.bxe.udp_rss: 0
hw.ix.enable_rss: 1
root@OPNsense:~ # dmesg | grep vectors
ixl0: Using MSI-X interrupts with 5 vectors
ixl1: Using MSI-X interrupts with 5 vectors
ixl0: Using MSI-X interrupts with 5 vectors
ixl1: Using MSI-X interrupts with 5 vectors
It should benefit Sensei in theory, but Sensei needs to support libnetmap API also, which will be added in version 21.7.2.
Cheers,
Franco
Quote from: franco on August 30, 2021, 07:56:21 PM
It should benefit Sensei in theory, but Sensei needs to support libnetmap API also, which will be added in version 21.7.2.
Great to hear this, thank you. 👍
In general, I am seeing those a lot of those when booting up, also see the attached screenshot. They were there before, but not that often I think.
ixl0: Failed to remove 0/1 filters, error I40E_AQ_RC_ENOENT
Here are my results:
Set the variables, installed the kernel and rebooted.
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 366 0 3456666 0 1796298 5252964
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 1 0 0 0 1
0 0 ether 0 0 7630456 0 0 0 7630456
0 0 ip6 0 2 0 452 0 712 1164
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 674 0 5662158 0 233572 5895730
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 4 0 0 0 212 212
1 1 arp 0 0 3280 0 0 0 3280
1 1 ether 0 0 13568727 0 0 0 13568727
1 1 ip6 0 4 0 1108 0 649 1757
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 538 0 3493297 0 2252147 5745444
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 0 2 0 0 0 2
2 2 ether 0 0 8776535 0 0 0 8776535
2 2 ip6 0 8 0 1538 0 987 2525
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 870 0 4571265 0 1898993 6470258
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 943 0 0 0 943
3 3 ether 0 0 10272150 0 0 0 10272150
3 3 ip6 0 4 0 446 0 391 837
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
sysctl -a | grep rss
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:0 5:1 6:2 7:3
net.inet.rss.enabled: 1
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 8
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 4
net.inet.rss.maxbits: 7
net.inet.rss.mask: 7
net.inet.rss.bits: 3
net.inet.rss.hashalgo: 2
hw.bxe.udp_rss: 0
hw.ix.enable_rss: 1
sysctl -a | grep isr
net.route.netisr_maxqlen: 256
net.isr.numthreads: 4
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 1
net.isr.maxthreads: 4
net.isr.dispatch: direct
I noticed that dispatch was still direct, so I set it and now it looks like this:
sysctl -w net.isr.dispatch=hybrid
sysctl -a | grep isr
net.route.netisr_maxqlen: 256
net.isr.numthreads: 4
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 1
net.isr.maxthreads: 4
net.isr.dispatch: hybrid
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy hybrid n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 455 0 6256523 0 3911206 10167729
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 1 1 0 0 44 45
0 0 ether 0 0 14655754 0 0 0 14655754
0 0 ip6 0 2 0 575 0 923 1498
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 936 0 10885857 0 366966 11252823
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 4 0 0 0 218 218
1 1 arp 0 1 4332 670 0 42 5044
1 1 ether 0 0 26865660 0 0 0 26865660
1 1 ip6 0 4 0 1306 0 833 2139
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 538 0 6589372 0 4728075 11317447
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 1 2 0 0 6 8
2 2 ether 0 0 16633885 0 0 0 16633885
2 2 ip6 0 8 0 2286 0 1388 3674
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 1000 0 8687234 71 4575663 13262897
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 1 1403 0 0 5 1408
3 3 ether 0 0 20808614 0 0 0 20808614
3 3 ip6 0 4 0 848 0 479 1327
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
Do I have to set it explicitly and is RSS working OK? Does not seem to make a difference, throughput with Sensei enabled is always the same (stock kernel and this kernel, dispatch policy direct or hybrid). As long as there is no speed degradation, everything should be fine I guess, as neither Sensei nor Suricata benefit from RSS ATM.
I am getting around 1.5 to 1.9G from my local librespeed instance that lives in my WAN transport net. Testing from LAN with a MacBook Pro and a 2.5G adapter. Blindtest in LAN shows 2.5G with that adapter, so everything is working like it should.
here's mine:
#lscpu
Architecture: amd64
Byte Order: Little Endian
Total CPU(s): 8
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
Vendor: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 3201 8-Core Processor
Stepping: 2
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 16M
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cflsh mmx fxsr sse sse2 htt sse3 pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave osxsave avx f16c rdrnd syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm lahf_lm cmp_legacy svm extapic cr8_legacy lzcnt sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb pcx_l2i
#dmesg | grep vector
igb0: Using MSI-X interrupts with 5 vectors
igb1: Using MSI-X interrupts with 5 vectors
igb2: Using MSI-X interrupts with 5 vectors
igb3: Using MSI-X interrupts with 5 vectors
ax0: Using MSI-X interrupts with 12 vectors
ax1: Using MSI-X interrupts with 12 vectors
# sysctl -a | grep rss
hw.bxe.udp_rss: 0
hw.ix.enable_rss: 1
# sysctl -a | grep isr
net.route.netisr_maxqlen: 256
net.isr.numthreads: 1
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 0
net.isr.maxthreads: 1
net.isr.dispatch: direct
It seems to work fine here on 2 systems (both igb network hardware, fairly basic setup + Sensei). Seems beneficial in distributing the load more evenly between different cores. Didn't do any performance testing though.
I followed the instructions, the only thing I noticed was that after setting the required sysctls, a final reboot was needed to activate RSS, as apparently not all of these settings are runtime-changeable. Perhaps this can be added to the tutorial in the first post, as it's currently not mentioned there.
Yes, we discussed that since and the tunable will be boot-writable only in the final implementation, see
https://github.com/opnsense/src/commit/3903650ce
Cheers,
Franco
I'm running quite a bit of OPNsense as a VM on Hyper-V and notice that the Hyper-V networking sees the network status as "OK" instead of "OK (VMQ active)".
I understand VMQ and RSS to be mutually exclusive technologies, and, as soon I enable "MAC address spoofing" as required for CARP under Hyper-V, the Hyper-V network switch for the VM disables VMQ. That really loads a single core on the Hyper-V host for all network traffic for that single OPNsense VM.
I see that Hyper-V does support vRSS with FreeBSD on version 11.0, 11.1-11.3 and 12-12.1
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-freebsd-virtual-machines-on-hyper-v (https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-freebsd-virtual-machines-on-hyper-v)
Do you think this will be able to be enabled under Hyper-V?
ok!
I decided to have a go. It looks like vRSS is alive and well running as a Hyper-V VM.
1 . VM has 8192 MB and 4 vCPU.
2. Underlying real nic on the Hyper-V host is Intel X710
I too had to add the "net.isr.dispatch = hybrid" to
System > Settings > Tunables
This should dramatically increase the throughput if the firewall comes under heavy network traffic load.
As I understand it, without VMQ or RSS, your limited to ~3-4 Gbps and at that rate, 1 CPU is saturated.
root@VM:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy hybrid n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 3 0 2478 0 62 2540
0 0 igmp 0 0 0 1 0 0 1
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 1 0 38 0 40 78
0 0 ether 0 0 3703 0 0 0 3703
0 0 ip6 0 0 0 0 0 0 0
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 2 0 519 0 73 592
1 1 igmp 0 1 0 0 0 1 1
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 13 0 117 0 1042 1159
1 1 ether 0 0 873 0 0 0 873
1 1 ip6 0 0 0 0 0 0 0
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 2 0 527 0 77 604
2 2 igmp 0 1 0 0 0 1 1
2 2 rtsock 0 1 0 0 0 39 39
2 2 arp 0 1 0 7 0 58 65
2 2 ether 0 0 858 0 0 0 858
2 2 ip6 0 1 0 0 0 7 7
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 2 0 420 0 59 479
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 0 0 0 0 0
3 3 ether 0 0 739 0 0 0 739
3 3 ip6 0 0 0 0 0 0 0
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
Best result I got across a layer 3 switch;
QuoteC:\iperf-3.1.3-win64>iperf3.exe -p 30718 -c 192.168.1.242
Connecting to host 192.168.1.242, port 30718
[ 4] local 10.1.1.16 port 63670 connected to 192.168.1.242 port 30718
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 83.4 MBytes 699 Mbits/sec
[ 4] 1.00-2.00 sec 190 MBytes 1.59 Gbits/sec
[ 4] 2.00-3.00 sec 269 MBytes 2.26 Gbits/sec
[ 4] 3.00-4.00 sec 262 MBytes 2.20 Gbits/sec
[ 4] 4.00-5.00 sec 260 MBytes 2.18 Gbits/sec
[ 4] 5.00-6.00 sec 263 MBytes 2.21 Gbits/sec
[ 4] 6.00-7.00 sec 214 MBytes 1.79 Gbits/sec
[ 4] 7.00-8.00 sec 246 MBytes 2.06 Gbits/sec
[ 4] 8.00-9.00 sec 250 MBytes 2.10 Gbits/sec
[ 4] 9.00-10.00 sec 208 MBytes 1.74 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 2.19 GBytes 1.88 Gbits/sec sender
[ 4] 0.00-10.00 sec 2.19 GBytes 1.88 Gbits/sec receiver
Will there be an updated test kernel for 21.7.2? I know it was just released today but figured I'd ask as I'm interested in testing this.
Sure, kernel is
# opnsense-update -zkr 21.7.2-rss
Make sure to set net.inet.rss.enabled to "1" from System: Settings: Tunables and reboot. As mentioned the sysctl cannot be changed at runtime anymore.
Cheers,
Franco
Quote from: franco on September 08, 2021, 10:30:14 AM
Sure, kernel is
# opnsense-update -zkr 21.7.2-rss
Make sure to set net.inet.rss.enabled to "1" from System: Settings: Tunables and reboot. As mentioned the sysctl cannot be changed at runtime anymore.
Cheers,
Franco
Thanks, updated to that kernel and set dispatch to hybrid manually again (do I have to?)
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 flow default ---
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 9 0 2531 0 1853 4384
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 2 0 0 0 2
0 0 ether 0 0 3519 0 0 0 3519
0 0 ip6 0 0 138 0 0 0 138
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 11 0 4706 0 4646 9352
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 0 4 0 0 0 4
1 1 ether 0 0 5731 0 0 0 5731
1 1 ip6 0 0 146 0 0 0 146
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 9 0 1794 0 5857 7651
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 5 0 0 0 153 153
2 2 arp 0 0 0 0 0 0 0
2 2 ether 0 0 2374 0 0 0 2374
2 2 ip6 0 1 189 0 0 7 196
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 6 0 2706 0 4659 7365
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 250 0 0 0 250
3 3 ether 0 0 3526 0 0 0 3526
3 3 ip6 0 1 137 0 0 3 140
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
root@OPNsense:~ # sysctl -w net.isr.dispatch=hybrid
net.isr.dispatch: direct -> hybrid
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy hybrid n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 flow default ---
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 9 0 2797 0 5265 8062
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 3 0 0 0 3
0 0 ether 0 0 4026 0 0 0 4026
0 0 ip6 0 0 160 0 0 0 160
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 11 0 5042 0 4827 9869
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 1 4 0 0 2 6
1 1 ether 0 0 6351 0 0 0 6351
1 1 ip6 0 0 170 1 0 0 171
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 9 0 2109 0 8354 10463
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 5 0 0 0 164 164
2 2 arp 0 0 0 0 0 0 0
2 2 ether 0 0 2829 0 0 0 2829
2 2 ip6 0 1 228 1 0 11 240
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 6 0 3034 0 6392 9426
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 385 0 0 0 385
3 3 ether 0 0 4073 0 0 0 4073
3 3 ip6 0 1 145 0 0 3 148
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
Quote from: nzkiwi68 on September 08, 2021, 04:10:45 AM
I too had to add the "net.isr.dispatch = hybrid" to
System > Settings > Tunables
Quote from: athurdent on September 08, 2021, 10:47:39 AM
Thanks, updated to that kernel and set dispatch to hybrid manually again (do I have to?)
It seems the original post wasn't all to clear on what really happens with the dispatching policy when enabling RSS. While 'netstat -Q' and the relevant sysctl shows that the dispatch policy is 'direct', this is a global setting which is overridden when enabling RSS. The output from netstat -Q actually shows this, as dispatch policies are also defined on a per-protocol basis. You will see that only ether is direct, which is correct for various technical reasons. From the IP layer and onwards the policy will be hybrid.
TLDR; There is no need to manually set the dispatch policy to hybrid.
Cheers,
Stephan
Quote from: nzkiwi68 on September 08, 2021, 04:34:52 AM
Best result I got across a layer 3 switch;
QuoteC:\iperf-3.1.3-win64>iperf3.exe -p 30718 -c 192.168.1.242
Connecting to host 192.168.1.242, port 30718
[ 4] local 10.1.1.16 port 63670 connected to 192.168.1.242 port 30718
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 83.4 MBytes 699 Mbits/sec
[ 4] 1.00-2.00 sec 190 MBytes 1.59 Gbits/sec
[ 4] 2.00-3.00 sec 269 MBytes 2.26 Gbits/sec
[ 4] 3.00-4.00 sec 262 MBytes 2.20 Gbits/sec
[ 4] 4.00-5.00 sec 260 MBytes 2.18 Gbits/sec
[ 4] 5.00-6.00 sec 263 MBytes 2.21 Gbits/sec
[ 4] 6.00-7.00 sec 214 MBytes 1.79 Gbits/sec
[ 4] 7.00-8.00 sec 246 MBytes 2.06 Gbits/sec
[ 4] 8.00-9.00 sec 250 MBytes 2.10 Gbits/sec
[ 4] 9.00-10.00 sec 208 MBytes 1.74 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 2.19 GBytes 1.88 Gbits/sec sender
[ 4] 0.00-10.00 sec 2.19 GBytes 1.88 Gbits/sec receiver
This is single thread performance, since you're only starting one iperf thread.
RSS comes into play when lots of different connections are built up, since it uses the protocol 4-tuples to calculate the resulting CPU core. Try
iperf3.exe -P 4
to start up some parallel threads. Note that RSS does not guarantee perfect load-balancing, 1 or more connections might still end up on the same CPU.
Cheers,
Stephan
Thanks @tuto2!
Wanted to post my iperf3 results, but somehow I cannot compare with the stock kernel, no idea how to revert. Turning off RSS does not work, looks like this after a reboot:
Cannot turn it off anymore:
root@OPNsense:~ # cat /boot/loader.conf | grep rss
net.inet.rss.enabled="0"
root@OPNsense:~ # sysctl net.inet.rss.enabled
net.inet.rss.enabled: 1
One thing I noticed between the two kernels is that ip6 went from hybrid to direct. Not sure why ip6 wouldn't have RSS enabled. Do we need to have a tunable set for this as well?
Quote from: madj42 on September 08, 2021, 02:02:06 PM
One thing I noticed between the two kernels is that ip6 went from hybrid to direct. Not sure why ip6 wouldn't have RSS enabled. Do we need to have a tunable set for this as well?
I'm not sure why this is the case for you. All protocols are dispatched using the default policy (which is direct) when NOT using RSS, unless you specify otherwise using a sysctl.
When using RSS, both ip and ip6 are switched to hybrid mode, which is correct.
Case in point:
No RSS:
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 8 8
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 flow default ---
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 source direct ---
ip6 6 256 flow default ---
WITH RSS:
Configuration:
Setting Current Limit
Thread count 8 8
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Disable after reboot works for me. I think this is the default also when nothing was specified.
Cheers,
Franco
Quote from: franco on September 08, 2021, 02:19:10 PM
Disable after reboot works for me. I think this is the default also when nothing was specified.
Cheers,
Franco
Thanks, it seems I had to delete the tunable instead of setting it to 0. After a reboot it was gone.
But I think, even after removing all of the tunables & rebooting, my ixl network card still balances the connections. There are 4 very active if_io_tqg_* when iperf'ing -P8 through the firewall.
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -76 - 0 656K CPU3 3 0:44 91.01% [kernel{if_io_tqg_3}]
0 root -76 - 0 656K CPU1 1 0:58 90.08% [kernel{if_io_tqg_1}]
0 root -76 - 0 656K - 2 0:51 82.79% [kernel{if_io_tqg_2}]
11 root 155 ki31 0 64K RUN 0 268:36 72.36% [idle{idle: cpu0}]
11 root 155 ki31 0 64K RUN 2 269:19 15.18% [idle{idle: cpu2}]
0 root -76 - 0 656K - 0 0:53 11.80% [kernel{if_io_tqg_0}]
11 root 155 ki31 0 64K RUN 1 255:37 9.32% [idle{idle: cpu1}]
11 root 155 ki31 0 64K RUN 3 269:09 6.96% [idle{idle: cpu3}]
ix also seems to have support for RSS, passed through my other 10G card to OPNsense.
ix0: <Intel(R) X520 82599ES (SFI/SFP+)> port 0xf020-0xf03f mem 0xfd600000-0xfd67ffff,0xfd680000-0xfd683fff irq 10 at device 17.0 on pci0
ix0: Using 2048 TX descriptors and 2048 RX descriptors
ix0: Using 4 RX queues 4 TX queues
ix0: Using MSI-X interrupts with 5 vectors
ix0: allocated for 4 queues
ix0: allocated for 4 rx queues
ix0: Ethernet address: ***
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: Error 2 setting up SR-IOV
ix0: netmap queues/slots: TX 4/2048, RX 4/2048
root@OPNsense:~ # sysctl -a | grep rss
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:0 5:1 6:2 7:3
net.inet.rss.enabled: 1
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 8
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 4
net.inet.rss.maxbits: 7
net.inet.rss.mask: 7
net.inet.rss.bits: 3
net.inet.rss.hashalgo: 2
hw.bxe.udp_rss: 0
hw.ix.enable_rss: 1
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 flow default ---
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 4 0 664 0 6779 7443
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 415 0 0 0 415
0 0 ether 0 0 2429 0 0 0 2429
0 0 ip6 0 1 39 0 0 6 45
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 6 0 688 0 6492 7180
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 7 0 0 0 338 338
1 1 arp 0 0 188 0 0 0 188
1 1 ether 0 0 1955 0 0 0 1955
1 1 ip6 0 2 114 0 0 31 145
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 5 0 1341 0 2715 4056
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 0 73 0 0 0 73
2 2 ether 0 0 4118 0 0 0 4118
2 2 ip6 0 0 782 0 0 0 782
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 16 0 353 0 4932 5285
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 0 0 0 0 0
3 3 ether 0 0 568 0 0 0 568
3 3 ip6 0 1 26 0 0 1 27
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
Quote from: athurdent on September 10, 2021, 04:31:38 PM
ix also seems to have support for RSS, passed through my other 10G card to OPNsense.
ix0: <Intel(R) X520 82599ES (SFI/SFP+)> port 0xf020-0xf03f mem 0xfd600000-0xfd67ffff,0xfd680000-0xfd683fff irq 10 at device 17.0 on pci0
ix0: Using 2048 TX descriptors and 2048 RX descriptors
ix0: Using 4 RX queues 4 TX queues
ix0: Using MSI-X interrupts with 5 vectors
ix0: allocated for 4 queues
ix0: allocated for 4 rx queues
ix0: Ethernet address: ***
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: Error 2 setting up SR-IOV
ix0: netmap queues/slots: TX 4/2048, RX 4/2048
root@OPNsense:~ # sysctl -a | grep rss
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:0 5:1 6:2 7:3
net.inet.rss.enabled: 1
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 8
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 4
net.inet.rss.maxbits: 7
net.inet.rss.mask: 7
net.inet.rss.bits: 3
net.inet.rss.hashalgo: 2
hw.bxe.udp_rss: 0
hw.ix.enable_rss: 1
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 flow default ---
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 4 0 664 0 6779 7443
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 415 0 0 0 415
0 0 ether 0 0 2429 0 0 0 2429
0 0 ip6 0 1 39 0 0 6 45
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 6 0 688 0 6492 7180
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 7 0 0 0 338 338
1 1 arp 0 0 188 0 0 0 188
1 1 ether 0 0 1955 0 0 0 1955
1 1 ip6 0 2 114 0 0 31 145
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 5 0 1341 0 2715 4056
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 0 73 0 0 0 73
2 2 ether 0 0 4118 0 0 0 4118
2 2 ip6 0 0 782 0 0 0 782
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 16 0 353 0 4932 5285
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 0 0 0 0 0
3 3 ether 0 0 568 0 0 0 568
3 3 ip6 0 1 26 0 0 1 27
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
I also have an ix based card and it seems you're also having the same issue I'm having in regards to ip6 not having RSS enabled. Not sure why this would be but it was enabled for me with the previous kernel version.
Quote from: madj42 on September 11, 2021, 02:11:48 AM
I also have an ix based card and it seems you're also having the same issue I'm having in regards to ip6 not having RSS enabled. Not sure why this would be but it was enabled for me with the previous kernel version.
Good find, seems that the .2 kernel has changes for IPv6. Just took a look at my ixl output from the previous kernel, and compared to the .2 one ixl also lost IPv6 hybrid there.
So, it's gone for ix and ixl at least.
By default, don't use core 0 for RSSAs I understand from my Windows Server RSS tuning, it's best never to assign core 0 to RSS because the first CPU core 0 gets used by all sorts of other base OS processes and thus you ultimately get better performance if;
- 4 core CPU (0,1,2,3)- you only set 3 cores for RSS - cores 1,2,3
- 8 core CPU (0,1,2,3,4,5,6,7)- you only set 7 cores for RSS - cores 2,3,4,5,6,7,8
I imagine FreeBSD is no different to Windows in regards to certain base processes and interrupts using core 0 exclusively.
You could also make a GUI setting or tunable option we can set, and, allow us to;
- select the first CPU core to use
- how many cores to use
That way, if I have a 16 core CPU (e.g. AMD EPYC), I would start at a higher core and only allocate a maximum of 8 cores.
Referenceshttps://www.darrylvanderpeijl.com/windows-server-2016-networking-optimizing-network-settings/ (https://www.darrylvanderpeijl.com/windows-server-2016-networking-optimizing-network-settings/)
https://www.broadcom.com/support/knowledgebase/1211161326328/rss-and-vmq-tuning-on-windows-servers (https://www.broadcom.com/support/knowledgebase/1211161326328/rss-and-vmq-tuning-on-windows-servers)
https://www.itprotoday.com/server-virtualization/why-you-skip-first-core-when-configuring-rss-and-vmq (https://www.itprotoday.com/server-virtualization/why-you-skip-first-core-when-configuring-rss-and-vmq)
As for deviating from FreeBSD further we need to take this step forward first on common ground.
For now it doesn't matter whether packet processing is all on CPU 0 or spread across more than one CPU including CPU 0. It's also a bit difficult since RSS is tied to multi queue setup in NICs and not populating a queue (3 to be used vs. 4 existing) can bring other side effects and packet distribution is probably even less optimal.
But in any case if there are easy improvements we will make them happen as we progress and try to upstream to FreeBSD.
Cheers,
Franco
Managed to get SR-IOV working on my Proxmox host.
Seems the iavf driver also supports RSS. No speed difference to the non RSS kernel version though.
Here's an iperf3 test through the firewall with RSS enabled.
With Sensei:
iperf3 -c192.168.178.4 -R -P10
.
[SUM] 0.00-10.00 sec 5.66 GBytes 4.86 Gbits/sec 5533 sender
[SUM] 0.00-10.00 sec 5.64 GBytes 4.85 Gbits/sec receiver
Sensei off:
iperf3 -c192.168.178.4 -R -P10
.
[SUM] 0.00-10.00 sec 7.18 GBytes 6.16 Gbits/sec 122 sender
[SUM] 0.00-10.00 sec 7.12 GBytes 6.12 Gbits/sec receiver
I am seeing something odd here. If I have RSS enabled, unbound is no longer working on all threads. I have a 2 core machine, unbound starts with 2 threads, but thread 0 is not doing any work. When I disable RSS using the sysctl and reboot, unbound is working normally again.
Are any other people seeing this?
RSS enabled:
root@jdjdehaan:~ # unbound-control -c /var/unbound/unbound.conf stats
thread0.num.queries=0
thread0.num.queries_ip_ratelimited=0
thread0.num.cachehits=0
thread0.num.cachemiss=0
thread0.num.prefetch=0
thread0.num.expired=0
thread0.num.recursivereplies=0
thread0.requestlist.avg=0
thread0.requestlist.max=0
thread0.requestlist.overwritten=0
thread0.requestlist.exceeded=0
thread0.requestlist.current.all=0
thread0.requestlist.current.user=0
thread0.recursion.time.avg=0.000000
thread0.recursion.time.median=0
thread0.tcpusage=0
thread1.num.queries=38
thread1.num.queries_ip_ratelimited=0
thread1.num.cachehits=5
thread1.num.cachemiss=33
thread1.num.prefetch=0
thread1.num.expired=0
thread1.num.recursivereplies=33
thread1.requestlist.avg=0.606061
thread1.requestlist.max=3
thread1.requestlist.overwritten=0
thread1.requestlist.exceeded=0
thread1.requestlist.current.all=0
thread1.requestlist.current.user=0
thread1.recursion.time.avg=0.057025
thread1.recursion.time.median=0.0354987
thread1.tcpusage=0
total.num.queries=38
total.num.queries_ip_ratelimited=0
total.num.cachehits=5
total.num.cachemiss=33
total.num.prefetch=0
total.num.expired=0
total.num.recursivereplies=33
total.requestlist.avg=0.606061
total.requestlist.max=3
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.057025
total.recursion.time.median=0.0177493
total.tcpusage=0
time.now=1631829579.788081
time.up=18.811257
time.elapsed=18.811257
RSS disabled:
root@jdjdehaan:~ # unbound-control -c /var/unbound/unbound.conf stats
thread0.num.queries=11
thread0.num.queries_ip_ratelimited=0
thread0.num.cachehits=1
thread0.num.cachemiss=10
thread0.num.prefetch=0
thread0.num.expired=0
thread0.num.recursivereplies=10
thread0.requestlist.avg=2.4
thread0.requestlist.max=6
thread0.requestlist.overwritten=0
thread0.requestlist.exceeded=0
thread0.requestlist.current.all=0
thread0.requestlist.current.user=0
thread0.recursion.time.avg=0.681863
thread0.recursion.time.median=0.436907
thread0.tcpusage=0
thread1.num.queries=16
thread1.num.queries_ip_ratelimited=0
thread1.num.cachehits=2
thread1.num.cachemiss=14
thread1.num.prefetch=0
thread1.num.expired=0
thread1.num.recursivereplies=14
thread1.requestlist.avg=0.571429
thread1.requestlist.max=2
thread1.requestlist.overwritten=0
thread1.requestlist.exceeded=0
thread1.requestlist.current.all=0
thread1.requestlist.current.user=0
thread1.recursion.time.avg=0.362820
thread1.recursion.time.median=0.131072
thread1.tcpusage=0
total.num.queries=27
total.num.queries_ip_ratelimited=0
total.num.cachehits=3
total.num.cachemiss=24
total.num.prefetch=0
total.num.expired=0
total.num.recursivereplies=24
total.requestlist.avg=1.33333
total.requestlist.max=6
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.495755
total.recursion.time.median=0.283989
total.tcpusage=0
time.now=1631829178.752625
time.up=124.904324
time.elapsed=57.175103
Does anyone know if there is a way to change the policy on the ip6 protocol from flow to cpu? That is the difference I notice in some of the previous posts. Thinking this may be why I'm not getting RSS on ip6.
Also decided to test this out with a Protectli FW6B.
- WAN is PPPoE 1G/1G
- LAN is split between LAN and 4 VLANs
- Sensei running on LAN.
I set net.inet.rss.bits = 2. Not sure if this is correct on a 2 core/2HT processor. Seems to work fine on speedtest with reasonable CPU usage.
Also something to keep in mind, if you upgrade to a hotfix release it will replace the RSS kernel. Be sure to reinstall the RSS kernel if you upgrade to the hotfix released yesterday.
> Also something to keep in mind, if you upgrade to a hotfix release it will replace the RSS kernel. Be sure to reinstall the RSS kernel if you upgrade to the hotfix released yesterday.
That's why you can lock any package to prevent it from upgrading. ;)
I found the lock in the UI. That's handy. ;D
Quote from: dinguz on September 17, 2021, 12:06:38 AM
I am seeing something odd here. If I have RSS enabled, unbound is no longer working on all threads. I have a 2 core machine, unbound starts with 2 threads, but thread 0 is not doing any work. When I disable RSS using the sysctl and reboot, unbound is working normally again.
Are any other people seeing this?
I have reported this a few weeks ago, and I'm wondering if I'm the only one experiencing this. Are other people seeing this as well, or have been able to reproduce?
It's quite easy to check: just enable RSS, reboot, go to the Unbound stats page in the GUI, and you'll probably see all zeroes for thread 0.
BTW: how do I revert to the stock kernel? If I do
opnsense-update -kr 21.7.2
it keeps trying to download a RSS version of the kernel.
Same thing here with the kernel. No matter what you type in, it will append -rss to the name. Looking at the verbose output it's saying that it's trying to download from the sets folder but when I look on every mirror, these kernels are in the snapshots folder. Not sure if they were removed? Due to this I'm getting a no valid signature error.
It's trying to stick to the kernel device designation which is correct historically speaking. We rarely do have different kernel builds and only really needed them with ARM which is where this feature comes from.
This should force it back to the generic device configuration (the -D '' bit is important)
# opnsense-update -kf -D ''
Cheers,
Franco
PS: https://github.com/opnsense/tools/commit/6fa3d553a -- this only affects testing kernels really and no production kernel can exhibit this issue.
Another observation, this is part of the 'netstat -Q' output from a stock kernel:
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 flow default ---
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 source direct ---
ip6 6 256 flow default ---
Below is the 'netstat -Q' output from a system with the RSS kernel, but RSS disabled (net.inet.rss.enabled=0):
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 2048 flow default ---
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 source direct ---
ip6 6 256 cpu hybrid C--
Notice the difference for ip6? Apparently the sysctl does not completely enable/disable RSS for ip6.
Let me check when I have some spare time. Stephan is not available at the moment to look into it.
Cheers,
Franco
Stephan found it, small issue with a refactor from my side:
# opnsense-update -zfkr 21.7.2-rss
The sticky device issue was also fixed, the kernel should report:
% uname -v
FreeBSD 12.1-RELEASE-p20-HBSD b9b6960472e(master) SMP
Cheers,
Franco
Thanks for the quick reply. Has anyone been able to have a look at the interaction with unbound?
Quote from: franco on October 07, 2021, 09:53:56 AM
Stephan found it, small issue with a refactor from my side:
# opnsense-update -zfkr 21.7.2-rss
The sticky device issue was also fixed, the kernel should report:
% uname -v
FreeBSD 12.1-RELEASE-p20-HBSD b9b6960472e(master) SMP
Cheers,
Franco
Not available to download, no signature found.
Quote from: MartB on October 07, 2021, 12:19:24 PM
Not available to download, no signature found.
Make sure:
1. Default mirror is set
2. "-z" is being used as indicated
It looks like everything is there: https://pkg.opnsense.org/FreeBSD:12:amd64/snapshots/sets/
I can look at Unbound issue too as soon as I can finish my work on the upcoming 21.10 business update.
Cheers,
Franco
Quote from: dinguz on October 07, 2021, 12:07:51 PM
Thanks for the quick reply. Has anyone been able to have a look at the interaction with unbound?
I will take a look at it, but as Franco noted I have a limited amount of time at hand right now. Will update as soon as I can.
Cheers,
Stephan
Quote from: franco on October 07, 2021, 01:17:11 PM
Quote from: MartB on October 07, 2021, 12:19:24 PM
Not available to download, no signature found.
Make sure:
1. Default mirror is set
2. "-z" is being used as indicated
It looks like everything is there: https://pkg.opnsense.org/FreeBSD:12:amd64/snapshots/sets/
I double checked (copied your command before), no dice cant get it to work. Mirror is also set to default.
Do i need to go back to base kernel and then install the new rss version maybe?
Fetching kernel-21.7.2-rss-amd64-RSS.txz: .+ mkdir -p /var/cache/opnsense-update/41781
+ opnsense-fetch -a -T 30 -q -o /var/cache/opnsense-update/41781/kernel-21.7.2-rss-amd64-RSS.txz.sig https://pkg.opnsense.org/FreeBSD:12:amd64/snapshots/sets/kernel-21.7.2-rss-amd64-RSS.txz.sig
.+ exit_msg ' failed, no signature found'
+ [ -n ' failed, no signature found' ]
+ echo ' failed, no signature found'
failed, no signature found
+ exit 1
Maybe you're having that other device type issue mentioned here. Helping is difficult if you don't paste the error, but try this one then:
# opnsense-update -zfkr 21.7.2-rss -D ''
Cheers,
Franco
Yup it was the additional RSS breaking things, solved with the -D '' saw it only after running verbose output on the update command.
Just curious as it's been a while since this was initially posted for testing. Will this eventually be rolled into an upcoming point release soon or an updated kernel with the latest changes in the repo?
Thank you so much for the fort and work you guys do!
The issue with RSS in IPv6 was fixed and the Unbound situation looks like a problem in the kernel regarding SO_REUSEPORT/RSS enabled combination: Unbound is using this by default, but it could be made conditional upon RSS flag for example.
Nevertheless we will ship the RSS-capable kernel with 21.7.4 defaulting to off like in this test series and will continue to improve it and do further testing to see if it can be enabled by default later on.
Cheers,
Franco
I just tested this and indeed the problem is gone when I disable so-reuseport in unbound.
Are there any other applications in general use with OPNsense that are known to use this socket option, and thus are possibly affected?
Hi,
Will RSS and multiqueue benefit PPPoE when used as the WAN connection with your ISP?
Thanks
I added the RSS conditional for so-reuseport:
https://github.com/opnsense/core/commit/84d6b2acd5
After a bit of research PPPoE does not accelerate in RSS since the IP packets are higher up the stack and RSS will hash these to "0" so the distribution is one core only. I assume that also includes using the
net.isr.dispatch=deferred
tunable so for PPPoE users RSS would likely have to be kept off to get at least some performance increase from their connection. Linux does have RPS to deal with PPPoE acceleration but FreeBSD has no equivalent.
Cheers,
Franco
PS: Thanks all for testing this, highly appreciated! We will be adding a note in the release notes for Suricata speedup testing in the development version for 21.7.4.
@Franco, thanks for the reply regarding PPPoE
Removed the question. Sorry, lack of caffeine. It's working great on 21.7.4. Thank you for the work as always!
Quote from: franco on October 22, 2021, 03:40:32 PM
The issue with RSS in IPv6 was fixed and the Unbound situation looks like a problem in the kernel regarding SO_REUSEPORT/RSS enabled combination: Unbound is using this by default, but it could be made conditional upon RSS flag for example.
I would consider this more of a workaround than a proper fix. Would this be a complicated issue to achieve a 'real' fix?
Sure, did not claim it was a fix -- more of an opportunistic safeguard to prevent misconfiguring unbound when using RSS since it's not possible to turn so-reuseport off from the GUI.
Digging through the kernel code is going to take a lot more time than a two line change and Stephan is currently pushing the RSS modifications to our upcoming 22.1 branch on FreeBSD 13 so we can have that datapoint as well (it could have been fixed already?).
Cheers,
Franco
I wanted to thank you guys for getting this into the 21.7.x release lately. After doing the works with the mentioned tunables, I know finally get my full internet I currently should have (1000/500). Before these changes the download side throttled around 500, so I ended up with a 500/500 line.
so THANK YOU!
Has anyone noticed significant (a full magnitude or greater slow down) on the average recursion time in unbound after enabling this option?
I tried with the latest stable build 21.7.5.
Cheers,
Unbound so-reuseport is buggy with RSS enabled so we removed it to avoid further problems. It might jus be that the outcome is the same speed wise either with so-reuseport disabled or RSS enabled mutually exclusive.
We will be looking into it, but with the beta just out it's better to concentrate on more urgent issues.
Cheers,
Franco
Quote from: Koldnitz on November 13, 2021, 08:16:51 PM
Has anyone noticed significant (a full magnitude or greater slow down) on the average recursion time in unbound after enabling this option?
I tried with the latest stable build 21.7.5.
Cheers,
it does seem more sluggish, recursion time has gone up. could also be other factors. i'm running 21.7.6
Quote from: franco on October 26, 2021, 08:40:52 AM
Linux does have RPS to deal with PPPoE acceleration but FreeBSD has no equivalent.
Franco, thanks for clearing this up. Is there any discussion going on about implementing RPS (or RFS: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/network-rfs) in FreeBSD? Is someone already working on it?
PPPoE is heavily used by many ISPs, at least here in Italy (and I think in Europe in general), it would be nice to have these kind of optimizations also for these kind of connections.
UPDATE: I just found out in this post (https://freebsd.markmail.org/message/5kuryot256poxznv) on freebsd-net mailing list that it has been implemented as "option RSS". Is that correct?
Thanks a lot,
Alessandro
I have to disagree with Alexander's conclusion in that mailing list. I mean we talk about RSS here and already established it's not RPS. Note that the mailing list question doesn't say anything about PPPoE use case either which makes RSS = RPS assumtion slightly less wrong, but it still mostly is.
I'm not having any hopes for RPS/PPPoE inside FreeBSD. The state is either deemed good enough by the developers or it's not a hindrance for the bulk of FreeBSD consumers (not users).
Cheers,
Franco
The post I linked was from a user that needed RPS on FreeBSD, not specifically for PPPoE but for another use-case. The answer of Alexander was pretty clear, that's why I was surprised and reported it here, to have a confirmation from you. :)
Right now I'm testing RSS as per instructions in 1st post, and as far as I can see, it's not using only core 0 as I read (for pppoe). What should I monitor specifically to check if it is spreading the workload or sticking to the first core? I'm checking with netstat -q:
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 10 0 533094 0 11564 544658
0 0 igmp 0 0 2 0 0 0 2
0 0 rtsock 0 2 0 0 0 36 36
0 0 arp 0 0 1625 0 0 0 1625
0 0 ether 0 0 2350239 0 0 0 2350239
0 0 ip6 0 0 0 14 0 0 14
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 11 0 0 0 335277 335277
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 0 0 0 0 0 0
1 1 ether 0 0 0 0 0 0 0
1 1 ip6 0 1 0 0 0 8 8
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 14 0 1235 0 478622 479857
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 0 0 0 0 0 0
2 2 ether 0 0 333485 0 0 0 333485
2 2 ip6 0 1 0 0 0 1 1
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 13 0 0 0 475546 475546
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 0 0 0 0 0
3 3 ether 0 0 0 0 0 0 0
3 3 ip6 0 1 0 0 0 1 1
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
I have a 1000/300 FTTH connection, here's a quick test with RSS enabled:
speedtest -s 4302
Speedtest by Ookla
Server: Vodafone IT - Milan (id = 4302)
ISP: Tecno General S.r.l
Latency: 8.26 ms (0.16 ms jitter)
Download: 937.00 Mbps (data used: 963.1 MB)
Upload: 281.64 Mbps (data used: 141.4 MB)
Packet Loss: Not available.
Result URL: https://www.speedtest.net/result/c/0e691806-5212-4fc3-b199-2b2e92660367
Thanks for the support.
I assume that is just the topic being lost in translation.
The point is: RSS works on incoming IP packets, mostly done in hardware. PPPoE is incoming in non-IP crossing over the same hardware. RSS doesn't work here. And pre-decapsulation RSS can't be applied. If you apply it post-decapsulation it's called RPS. And we don't have RPS.
Cheers,
Franco
Quote from: Tupsi on November 13, 2021, 05:01:42 PM
I wanted to thank you guys for getting this into the 21.7.x release lately. After doing the works with the mentioned tunables, I know finally get my full internet I currently should have (1000/500). Before these changes the download side throttled around 500, so I ended up with a 500/500 line.
so THANK YOU!
Seems to make a big difference to me as well - very well done - I am just using a lowly Qotom J1900 box, which has probably been in use for 6-7 years now.
I was just about to pull the trigger on an upgraded box, as I have just upgraded to Gig broadband, but I was seeing 'kernel{if_io_tqg' pegged to 100% at around 700-750Mbps - less, 400-500Mbps, with the shaper enabled.
I have the following set using 21.7.6
hw.pci.enable_msix="1"
machdep.hyperthreading_allowed="0"
hw.em.rx_process_limit="-1"
net.link.ifqmaxlen="8192"
net.isr.numthreads=4
net.isr.defaultqlimit=4096
net.isr.bindthreads=1
net.isr.maxthreads=4
net.inet.rss.enabled=1
net.inet.rss.bits=2
dev.em.3.iflib.override_nrxds="4096"
dev.em.3.iflib.override_ntxds="4096"
dev.em.3.iflib.override_qs_enable="1"
dev.em.3.iflib.override_nrxqs="4"
dev.em.3.iflib.override_ntxqs="4"
dev.em.2.iflib.override_nrxds="4096"
dev.em.2.iflib.override_ntxds="4096"
dev.em.2.iflib.override_qs_enable="1"
dev.em.2.iflib.override_nrxqs="4"
dev.em.2.iflib.override_ntxqs="4"
dev.em.1.iflib.override_nrxds="4096"
dev.em.1.iflib.override_ntxds="4096"
dev.em.1.iflib.override_qs_enable="1"
dev.em.1.iflib.override_nrxqs="4"
dev.em.1.iflib.override_ntxqs="4"
dev.em.0.iflib.override_nrxds="4096"
dev.em.0.iflib.override_ntxds="4096"
dev.em.0.iflib.override_qs_enable="1"
dev.em.0.iflib.override_nrxqs="4"
dev.em.0.iflib.override_ntxqs="4"
dev.em.0.fc="0"
dev.em.1.fc="0"
dev.em.2.fc="0"
dev.em.3.fc="0"
And I can now achieve 940Mbps raw throughput.
Also I can now happily run the shaper, set to 900Mbps with FQ Codel, which (only) brings it down to around 890Mbps on the WaveForm BufferBloat Speedtest, A+ grade results as well. Looks like the J1900 gets to live a little longer :)
I'll let it run for a few days, just to make sure there are no issues with the various UDP tunnels etc I have setup, then try with HT (as I see there was a request for data in the earlier posts).
Quote from: franco on November 14, 2021, 09:13:29 AM
Unbound so-reuseport is buggy with RSS enabled so we removed it to avoid further problems. It might jus be that the outcome is the same speed wise either with so-reuseport disabled or RSS enabled mutually exclusive.
We will be looking into it, but with the beta just out it's better to concentrate on more urgent issues.
Cheers,
Franco
I'm planning on testing RSS during my Xmas break from work. I use Unbound and this message makes me unclear if testing should be done if I'm using Unbound.
The previous message regarding a commit to make so_reuseport conditional on the sysctl value made me think it was OK. Then this later message makes me ask.
What's the current status of RSS + Unbound.
If I tried and tested it, what is the expected behaviour ?
I am on 21.7.5, OpenSSL, Harware APU4. IDS on LAN.
Thanks.
I'm using Unbound and RSS at home and I don't notice any difference. The situation needs some sort of fix in the kernel, but for day to day use it's good enough.
Cheers,
Franco
Quote from: franco on December 15, 2021, 09:07:59 AM
I'm using Unbound and RSS at home and I don't notice any difference. The situation needs some sort of fix in the kernel, but for day to day use it's good enough.
Cheers,
Franco
Perfect, thanks Franco. I might just take the jump today.
I applied the changes to enable RSS yesterday, rebooted and so far no adverse effects noticed, the only exception being that Unbound seems to utilise only one thread.
penguin@OPNsense:~ % sudo sysctl -a | grep -i 'isr.bindthreads\|isr.maxthreads\|inet.rss.enabled\|inet.rss.
bits'
net.inet.rss.enabled: 1
net.inet.rss.bits: 2
net.isr.bindthreads: 1
net.isr.maxthreads: 4
penguin@OPNsense:~ % sudo netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
OPN 21.7.5, OpenSSL, Hardware is APU4. IDS on LAN.
Upgraded BIOS beforehand, coreboot v 4.14.0.6 .
Network interfaces on this system are igb.
Thanks for this development.
Hi,
My goal is allowing PPPoE to be shared across multiple CPU's, something like PPPoE per-queue load distribution. This is to allow me to get greater than gigabit speed through my OPNsense router when using PPPoE.
I use proxmox to virtualise my OPNsense instance for many reasons, so additional to the above, I would like to see the improvements be applied to a virtual instance.
My question is, can RSS work with the vtnet virtio driver?
From my research, this was being worked on through making use of eBPF
https://qemu.readthedocs.io/en/latest/devel/ebpf_rss.html
This requires eBPF being implemented into FreeBSD (And of course in OPNsense)
https://ebpf.io/what-is-ebpf
https://wiki.freebsd.org/SummerOfCode2020Projects/eBPFXDPHooks
Can anyone help with if this has been implemented and if its possible to test this?
I can test but I'm not sure where to start.
Thanks for any advice on this
Updated - Perhaps it was never finally completed and implemented?
https://www.freebsd.org/status/report-2020-07-2020-09.html#Google-Summer-of-Code%E2%80%9920-Project---eBPF-XDP-Hooks
Thats a pitty as it sounds like it would greatly benefit the use of FreeBSD/OPNsense when needing a tunnel type connection like PPPoE.
Would rss be included in the 22.1 RC kernel?
It appears so!
Has anyone got this working with a Mellanox ConnectX-3?
Does this work with a VMware VMX NIC Interface VMXNet3?
Running OPNsense in a VM on ESXi 7.0 U3
Repost of my earlier (more generic) questions (https://forum.opnsense.org/index.php?topic=26481.0) about the use of RSS (22.1-RC2):
Following the tutorial I can choose between 3 (8-core) or 4 (16-core) for the rss bits tunable:
net.inet.rss.bits = X
Using a 'real' 12 Core system (No HyperThreading) this results in the following bucket mapping, I guess the same result would be true for 6 Core systems _with_ HT.
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 8:8 9:9 10:10 11:11 12:0 13:1 14:2 15:3
Being unaware of the rss scheduling magic 'under the hood', it looks like core 0,1,2 and 3 getting double the threads
... 12:0 13:1 14:2 15:3
Is this just visual and will the 12-15 map never been used with 12 cores?
The rss bucket map tunable is read-only, if having 16 buckets is the result of how the rss bits tunable works, is there a way to "remap" the buckets. I would rather like something like this (saving core 0)
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 8:8 9:9 10:10 11:11 12:8 13:9 14:10 15:11
12:8 13:9 14:10 15:11
My rss enabled netstat output:
# netstat -Q
Configuration:
Setting Current Limit
Thread count 12 12
Default queue limit 256 10240
Dispatch policy deferred n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 3000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 48 0 1186 0 127492 128678
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 0 0 0 0 0
0 0 ether 0 0 70140 0 0 0 70140
0 0 ip6 0 1 0 0 0 203 203
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 193 0 107 0 694180 694287
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 2 0 0 0 288 288
1 1 ether 0 0 14552 0 0 0 14552
1 1 ip6 0 1 0 0 0 207 207
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 32 0 80 0 205258 205338
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 2 0 0 0 175 175
2 2 arp 0 1 0 0 0 84 84
2 2 ether 0 0 142672 0 0 0 142672
2 2 ip6 0 1 0 0 0 74 74
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 463 0 69 0 380352 380421
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 0 0 0 0 0
3 3 ether 0 0 135986 0 0 0 135986
3 3 ip6 0 1 0 0 0 145 145
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
4 4 ip 0 11 0 0 0 177655 177655
4 4 igmp 0 0 0 0 0 0 0
4 4 rtsock 0 0 0 0 0 0 0
4 4 arp 0 0 0 0 0 0 0
4 4 ether 0 0 128748 0 0 0 128748
4 4 ip6 0 2 0 0 0 48 48
4 4 ip_direct 0 0 0 0 0 0 0
4 4 ip6_direct 0 0 0 0 0 0 0
5 5 ip 0 9 0 9 0 73864 73873
5 5 igmp 0 0 0 0 0 0 0
5 5 rtsock 0 0 0 0 0 0 0
5 5 arp 0 0 0 0 0 0 0
5 5 ether 0 0 165365 0 0 0 165365
5 5 ip6 0 1 0 0 0 14 14
5 5 ip_direct 0 0 0 0 0 0 0
5 5 ip6_direct 0 0 0 0 0 0 0
6 6 ip 0 26 0 0 0 306593 306593
6 6 igmp 0 0 0 0 0 0 0
6 6 rtsock 0 0 0 0 0 0 0
6 6 arp 0 0 0 0 0 0 0
6 6 ether 0 0 71276 0 0 0 71276
6 6 ip6 0 2 0 0 0 141 141
6 6 ip_direct 0 0 0 0 0 0 0
6 6 ip6_direct 0 0 0 0 0 0 0
7 7 ip 0 475 0 0 0 169312 169312
7 7 igmp 0 0 0 0 0 0 0
7 7 rtsock 0 0 0 0 0 0 0
7 7 arp 0 1 0 0 0 34 34
7 7 ether 0 0 309366 0 0 0 309366
7 7 ip6 0 2 0 1 0 199 200
7 7 ip_direct 0 0 0 0 0 0 0
7 7 ip6_direct 0 0 0 0 0 0 0
8 8 ip 0 34 0 64406 0 75441 139847
8 8 igmp 0 0 0 0 0 0 0
8 8 rtsock 0 0 0 0 0 0 0
8 8 arp 0 2 0 0 0 1002 1002
8 8 ether 0 0 3487916 0 0 0 3487916
8 8 ip6 0 1 0 23 0 4 27
8 8 ip_direct 0 0 0 0 0 0 0
8 8 ip6_direct 0 0 0 0 0 0 0
9 9 ip 0 13 0 80 0 408429 408509
9 9 igmp 0 0 0 0 0 0 0
9 9 rtsock 0 0 0 0 0 0 0
9 9 arp 0 3 0 0 0 499 499
9 9 ether 0 0 15999 0 0 0 15999
9 9 ip6 0 4 0 0 0 268 268
9 9 ip_direct 0 0 0 0 0 0 0
9 9 ip6_direct 0 0 0 0 0 0 0
10 10 ip 0 970 0 61 0 167884 167945
10 10 igmp 0 0 0 0 0 0 0
10 10 rtsock 0 0 0 0 0 0 0
10 10 arp 0 0 0 0 0 0 0
10 10 ether 0 0 74687 0 0 0 74687
10 10 ip6 0 1 0 0 0 43 43
10 10 ip_direct 0 0 0 0 0 0 0
10 10 ip6_direct 0 0 0 0 0 0 0
11 11 ip 0 450 0 104 0 170014 170118
11 11 igmp 0 0 0 0 0 0 0
11 11 rtsock 0 0 0 0 0 0 0
11 11 arp 0 2 0 0 0 2262 2262
11 11 ether 0 0 140111 0 0 0 140111
11 11 ip6 0 1 0 1 0 37 38
11 11 ip_direct 0 0 0 0 0 0 0
11 11 ip6_direct 0 0 0 0 0 0 0
Quote from: DanMc85 on January 25, 2022, 09:11:36 PM
Does this work with a VMware VMX NIC Interface VMXNet3?
Running OPNsense in a VM on ESXi 7.0 U3
I have RSS enabled on my OPNsense VM running on ESXi 6.7 U3, so I suspect it should run fine on ESXi 7.0 U3 as well.
i notice a lot of latency with RSS enabled, especially when a website makes many XHR requests. Loading takes a long time compared to default setup no RSS.
intel quad core, no HT
x710t2l 8.00 firmware
opnsense 22.1
What sort of page load times are we talking about overall in the on and off case? It's a little hard to tell if this is expected or not.
I general requests to the same server should still land on the same CPU...
Cheers,
Franco
Hi,
I'm shooting in the dark here. I activated RSS yesterday on my system with AMD Ryzen CPU. Since the multi-threading technology used by AMD is not the same as Intels hyper-threading should the net.inet.rss.bits be per core or per number of threads for AMD? If I understand it correctly AMD can write simultaneously to the core from the threads and are not limited like Intels hyper-threading?
Any expert out there who knows? Thanks!
Hello,
I would also like to test this improvement with RSS. I have read through all the forum posts and believe I need to make these changes. I have a protectli with ix (ixl?) NIC driver and also 2 cores, 4 threads.
The values I believe I need to set are as following under the gui tunables section:
net.isr.maxthreads = 4
net.isr.dispatch = deferred
hw.ix.enable_rss = 1
net.isr.bindthreads = 1
net.inet.rss.enabled = 1
net.inet.rss.bits = 2
Does this above 6 tunables seem to make sense. Sorry for asking, its hard to follow but I believe they should work correctly.
Also I am on OPNsense 22.1-amd64 .
Do i need to still run this command?
opnsense-update -zfkr 21.7.2-rss -D
Or can I just change to a dev build or something?
Many thanks in advance. I will do some performance testing before and after if someone can confirm my changes :)
Kind regards
Pete
you do not want to install the 21.7 kernel if you are on 22.1
my atom firewall 8core 16GB
net.inet.rss.bits 2
net.inet.rss.enabled 1
net.isr.bindthreads 1
net.isr.dispatch hybrid
net.isr.maxthreads -1
net.isr.numthreads 2
results via netstat -Q
Setting Current Limit
Thread count 8 8
Default queue limit 256 10240
Dispatch policy hybrid n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Thank you zz00mm.
Based on what you said I did this in my screenshot.
I tested before and after things like latency, speed upload download etc from opnsense itself and from a client machine and also checked cpu etc.
Before and after I could detect no discernible difference. I was expecting some sort of speed boost but apparently not much changed.
Here is my output of commands. I guess its working although I dont see any benefit at all from it. Not sure if thats expected? Everything is the same even the temperatures of the unit.
My understanding of enabling RSS will only show improvement when under heavy load. Speedtest doesn't really count as heavy load. you need multiple devices running to really stress the firewall. Also all the software components running within the FW need to be multi threaded (not single threaded)
Ok cool, guess it works then. Or at least no problems. Unsure if anyone is seeing some major benefit. Guess I dont hammer my firewall enough.
I looked back a few pages, looking for answers now we're on 22.1.x , I don't think I've missed it.
I am on OPNsense 22.1.1_3-amd64 and had the RSS tunables set for RSS testing during 22.7 and didn't remove them prior to the firewall upgrade to 22.1. I also had some non-RSS tunables for igb performance.
My tunables now show a few that are unsupported. I like that the upgrade routines found them and labelled nicely, devs are awesome!
Can I just remove them?
Name Type Value
hw.em.tx_process_limit unsupported -1
hw.uart.console unsupported default (io:0x3f8,br:115200)
legal.intel_igb.license_ack unsupported 1
net.inet.rss.bits unsupported 2
net.inet.rss.enabled unsupported 1
net.isr.bindthreads unsupported 1
net.isr.maxthreads unsupported -1
Update to 22.1.2 first before removing unsupported. At least hw.uart.console is not really unsupported just hidden from "sysctl" framework.
But yes, unsupported tunables are no longer in the kernel so they can be safely removed and don't do anything anymore.
Cheers,
Franco
I'll do that then. Thank you franco.
I have the DEC850 with 8 core and yet maxthreads -1 does not seem to take...
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
net.inet.rss.enabled: 0
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 8
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 8
net.inet.rss.maxbits: 7
net.inet.rss.mask: 7
net.inet.rss.bits: 3
net.inet.rss.hashalgo: 2
net.isr.numthreads: 1
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 0
net.isr.maxthreads: 1
net.isr.dispatch: direct
netstat -Q
Configuration:
Setting Current Limit
Thread count 1 1
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs disabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 flow default ---
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 source direct ---
ip6 6 1000 flow default ---
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 8 121180 0 0 22651 143831
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 2 0 0 0 54 54
0 0 arp 0 0 602 0 0 0 602
0 0 ether 0 0 285907 0 0 0 285907
0 0 ip6 0 1 0 0 0 39 39
I only see
net.inet.rss.enabled: 0
net.isr.maxthreads: 1
;)
May I ask what command is being executed to show the unsupported? when I use sysctl -a I dont see anything reported as unsupported.
Well, donno why I had to execute restart from the Web, I updated and saved the tunables and applied. Nothing happened so, I rebooted from the shell. that didn't work so I rebooted again which now works using the Web management.
└─[$]> sudo sysctl -a | grep -i "net.isr\|net.rss"
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 8:0 9:1 10:2 11:3 12:4 13:5 14:6 15:7
net.inet.rss.enabled: 1
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 16
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 8
net.inet.rss.maxbits: 7
net.inet.rss.mask: 15
net.inet.rss.bits: 4
net.inet.rss.hashalgo: 2
net.isr.numthreads: 8
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 1
net.isr.maxthreads: 8
net.isr.dispatch: direct
└─[$]> netstat -Q
Configuration:
Setting Current Limit
Thread count 8 8
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
AND WOW!!! I am 7F cooler when using RSS! 8)
Quote from: zz00mm on March 04, 2022, 12:16:23 AM
May I ask what command is being executed to show the unsupported? when I use sysctl -a I dont see anything reported as unsupported.
I'm also interested in knowing this. :)
Quote from: fadern on March 13, 2022, 09:04:20 PM
Quote from: zz00mm on March 04, 2022, 12:16:23 AM
May I ask what command is being executed to show the unsupported? when I use sysctl -a I dont see anything reported as unsupported.
I'm also interested in knowing this. :)
What's not returned by "sysctl -a" is marked as unsupported. It's a relatively safe bet. Except for some edge case hidden stuff in boot loaders for likely historic reasons.
Cheers,
Franco
The current Suricata/Netmap implementation limits this re-injection to one thread only.
Work is underway to address this issue since the new Netmap API (V14+) is now capable of increasing this thread count.
Until then, no benefit is gained from RSS when using IPS.
Any news on this? This plus RSS on lower power multi-cored devices sounds interesting.
The development release type has the suricata/netmap changes, but due to stability concerns it has been parked there.
We've made tests with Suricata 5 and the Netmap v14 API and it seems to perform better in general which would indicate there is at least 1 issue still in Suricata 6 that makes it unstable whether or not v14 API (development release) is used or not (community release).
Cheers,
Franco
Hi,
I've been trying to get this working on ESX and a VM OPNsense with 8 vcpu. For some reason if I run an iperf with single tcp stream I only get about 600Mbit/s and watching on OPNsense with top -P I can see 1 of the 8 cores going to 0% idle. Using the VMXNET3 adapter.
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 8 8
Default queue limit 256 10240
Dispatch policy hybrid n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 0 0 1097010 0 0 1097010
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 0 0 0 0 0
0 0 ether 0 0 1173731 0 0 0 1173731
0 0 ip6 0 0 0 63446 0 0 63446
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 34 0 475830 0 38 475868
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 1 0 0 0 12712 12712
1 1 ether 0 0 539495 0 0 0 539495
1 1 ip6 0 2 0 63626 0 4216 67842
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 0 0 412891 0 0 412891
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 1 0 0 0 510 510
2 2 ether 0 0 420304 0 0 0 420304
2 2 ip6 0 1 0 7412 0 1 7413
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 0 0 653430 0 0 653430
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 1 0 0 0 53 53
3 3 ether 0 0 676969 0 0 0 676969
3 3 ip6 0 0 0 23539 0 0 23539
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
4 4 ip 0 23 0 354980 0 11847 366827
4 4 igmp 0 0 0 0 0 0 0
4 4 rtsock 0 0 0 0 0 0 0
4 4 arp 0 0 0 0 0 0 0
4 4 ether 0 0 358176 0 0 0 358176
4 4 ip6 0 1 0 3074 0 1 3075
4 4 ip_direct 0 0 0 0 0 0 0
4 4 ip6_direct 0 0 0 0 0 0 0
5 5 ip 0 1 0 855737 0 2 855739
5 5 igmp 0 0 0 0 0 0 0
5 5 rtsock 0 3 0 0 0 4717 4717
5 5 arp 0 0 0 0 0 0 0
5 5 ether 0 0 859020 0 0 0 859020
5 5 ip6 0 1 0 3281 0 159 3440
5 5 ip_direct 0 0 0 0 0 0 0
5 5 ip6_direct 0 0 0 0 0 0 0
6 6 ip 0 0 0 1513336 0 0 1513336
6 6 igmp 0 0 0 0 0 0 0
6 6 rtsock 0 0 0 0 0 0 0
6 6 arp 0 0 0 0 0 0 0
6 6 ether 0 0 1517246 0 0 0 1517246
6 6 ip6 0 1 0 3910 0 1 3911
6 6 ip_direct 0 0 0 0 0 0 0
6 6 ip6_direct 0 0 0 0 0 0 0
7 7 ip 0 0 0 335859 0 0 335859
7 7 igmp 0 0 0 0 0 0 0
7 7 rtsock 0 0 0 0 0 0 0
7 7 arp 0 0 0 0 0 0 0
7 7 ether 0 0 341939 0 0 0 341939
7 7 ip6 0 0 0 6080 0 0 6080
7 7 ip_direct 0 0 0 0 0 0 0
7 7 ip6_direct 0 0 0 0 0 0 0
root@OPNsense:~ # vmstat -i
interrupt total rate
irq1: atkbd0 2 0
irq17: mpt0 314344 5
irq18: uhci0 110225 2
cpu0:timer 1335718 21
cpu1:timer 569148 9
cpu2:timer 597637 9
cpu3:timer 592890 9
cpu4:timer 590653 9
cpu5:timer 593208 9
cpu6:timer 592515 9
cpu7:timer 609112 10
irq24: ahci0 41838 1
irq26: vmx0:rxq0 188954 3
irq27: vmx0:rxq1 138552 2
irq28: vmx0:rxq2 71792 1
irq29: vmx0:rxq3 162662 3
irq30: vmx0:rxq4 109552 2
irq31: vmx0:rxq5 166029 3
irq32: vmx0:rxq6 317057 5
irq33: vmx0:rxq7 63136 1
irq43: vmx1:rxq0 1759 0
irq44: vmx1:rxq1 2393 0
irq45: vmx1:rxq2 4260 0
irq46: vmx1:rxq3 557 0
irq47: vmx1:rxq4 1137 0
irq48: vmx1:rxq5 3461 0
irq49: vmx1:rxq6 4689 0
irq50: vmx1:rxq7 1468 0
irq60: vmx2:rxq0 73391 1
irq61: vmx2:rxq1 153881 2
irq62: vmx2:rxq2 54965 1
irq63: vmx2:rxq3 75044 1
irq64: vmx2:rxq4 98827 2
irq65: vmx2:rxq5 277362 4
irq66: vmx2:rxq6 63113 1
irq67: vmx2:rxq7 69899 1
Total 8051230 127
My current settings, ive played a lot with these but havent gotten anything above 800Mbits/sec:
vmx0: Using MSI-X interrupts with 9 vectors
vmx1: Using MSI-X interrupts with 9 vectors
vmx2: Using MSI-X interrupts with 9 vectors
vmx0: Using MSI-X interrupts with 9 vectors
vmx1: Using MSI-X interrupts with 9 vectors
vmx2: Using MSI-X interrupts with 9 vectors
vmx0: Using MSI-X interrupts with 9 vectors
vmx1: Using MSI-X interrupts with 9 vectors
vmx2: Using MSI-X interrupts with 9 vectors
net.inet.rss.bucket_mapping: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
net.inet.rss.enabled: 1
net.inet.rss.debug: 0
net.inet.rss.basecpu: 0
net.inet.rss.buckets: 8
net.inet.rss.maxcpus: 64
net.inet.rss.ncpus: 8
net.inet.rss.maxbits: 7
net.inet.rss.mask: 7
net.inet.rss.bits: 3
net.inet.rss.hashalgo: 2
net.isr.numthreads: 8
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 1
net.isr.maxthreads: 8
net.isr.dispatch: hybrid
hw.vmd.max_msix: 3
hw.vmd.max_msi: 1
hw.sdhci.enable_msi: 1
hw.puc.msi_disable: 0
hw.pci.honor_msi_blacklist: 1
hw.pci.msix_rewrite_table: 0
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1
hw.mfi.msi: 1
hw.malo.pci.msi_disable: 0
hw.ix.enable_msix: 1
hw.bce.msi_enable: 1
hw.aac.enable_msi: 1
machdep.disable_msix_migration: 0
machdep.num_msi_irqs: 2048
machdep.first_msi_irq: 24
dev.vmx.2.iflib.disable_msix: 0
dev.vmx.1.iflib.disable_msix: 0
dev.vmx.0.iflib.disable_msix: 0
iperf3 -c 10.0.0.4 -p 4999 -P 1
Connecting to host 10.0.0.4, port 4999
[ 5] local 192.168.2.100 port 57384 connected to 10.0.0.4 port 4999
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 72.6 MBytes 609 Mbits/sec 20 618 KBytes
[ 5] 1.00-2.00 sec 83.8 MBytes 703 Mbits/sec 0 714 KBytes
[ 5] 2.00-3.00 sec 83.8 MBytes 703 Mbits/sec 9 584 KBytes
[ 5] 3.00-4.00 sec 83.8 MBytes 703 Mbits/sec 0 684 KBytes
[ 5] 4.00-5.00 sec 83.8 MBytes 703 Mbits/sec 1 556 KBytes
[ 5] 5.00-6.00 sec 80.0 MBytes 671 Mbits/sec 0 659 KBytes
[ 5] 6.00-7.00 sec 85.0 MBytes 713 Mbits/sec 1 525 KBytes
[ 5] 7.00-8.00 sec 83.8 MBytes 703 Mbits/sec 0 636 KBytes
[ 5] 8.00-9.00 sec 83.8 MBytes 703 Mbits/sec 0 732 KBytes
[ 5] 9.00-10.00 sec 85.0 MBytes 713 Mbits/sec 5 611 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 825 MBytes 692 Mbits/sec 36 sender
[ 5] 0.00-10.00 sec 821 MBytes 689 Mbits/sec receiver
I have also played with VM advanced settings on ESXi side, tried these two which I found from Paloalto site:
ethernet1.pnicFeatures = "4"
ethernet2.pnicFeatures = "4"
ethernet3.pnicFeatures = "4"
ethernet1.ctxPerDev = "1"
ethernet2.ctxPerDev = "1"
ethernet3.ctxPerDev = "1"
But no help, driver inside ESXi is:
esxcli network nic get -n vmnic0
Advertised Auto Negotiation: true
Advertised Link Modes: Auto, 100BaseT/Full, 1000BaseT/Full, 10000BaseT/Full
Auto Negotiation: true
Cable Type: Twisted Pair
Current Message Level: 0
Driver Info:
Bus Info: 0000:03:00:0
Driver: ixgben
Firmware Version: 0x8000038e
Version: 1.8.7
Link Detected: true
Link Status: Up
Name: vmnic0
PHYAddress: 0
Pause Autonegotiate: false
Pause RX: false
Pause TX: false
Supported Ports: TP
Supports Auto Negotiation: true
Supports Pause: true
Supports Wakeon: true
Transceiver:
Virtual Address: 00:50:56:50:0d:98
Wakeon: MagicPacket(tm)
esxcli system module parameters list -m ixgben
Name Type Value Description
------- ------------ ----- --------------------------------------------------------------------------------------------------------------------------------
DRSS array of int DefQueue RSS state: 0 = disable, 1 = enable (default = 0; 4 queues if DRSS is enabled)
DevRSS array of int Device RSS state: 0 = disable, 1 = enable (default = 0; 16 queues but all virtualization features disabled if DevRSS is enabled)
QPair array of int Pair Rx & Tx Queue Interrupt: 0 = disable, 1 = enable (default)
RSS array of int 1,1 NetQueue RSS state: 0 = disable, 1 = enable (default = 1; 4 queues if RSS is enabled)
RxITR array of int Default RX interrupt interval: 0 = disable, 1 = dynamic throttling, 2-1000 in microseconds (default = 50)
TxITR array of int Default TX interrupt interval: 0 = disable, 1 = dynamic throttling, 2-1000 in microseconds (default = 100)
VMDQ array of int Number of Virtual Machine Device Queues: 0/1 = disable, 2-16 enable (default = 8)
max_vfs array of int Maximum number of VFs to be enabled (0..63)
Ive been running a paloalto firewall also and I dont seem to have these problems with that. Wanting to switch from Paloalto to OPNsense but want to figure out this problem first. Any ideas what to try next? OPNsense version is 22.1.3-amd64
Hi @muunisti,
Check out the following and see if it helps:
https://forum.opnsense.org/index.php?topic=19175.msg91450#msg91450
Do you have any good news concerning this topic?
Is it stable now?
Maybe in 22.7?
Quote from: franco on March 21, 2022, 07:54:04 AM
The development release type has the suricata/netmap changes, but due to stability concerns it has been parked there.
We've made tests with Suricata 5 and the Netmap v14 API and it seems to perform better in general which would indicate there is at least 1 issue still in Suricata 6 that makes it unstable whether or not v14 API (development release) is used or not (community release).
Cheers,
Franco
It's still not perfect from our end so it's locked away in the development release (System: Firmware: Settings: release type).
Cheers,
Franco
Hi,
I tried enabling RSS and Suricata works. Better spread of CPU load and better performance. However, haproxy runs into issues. HAProxy can't connect to anything, not for health checks and not for live traffic. Based on earlier comment on so_reuseport, I changed my config to simple binds and enabled noreuseport for haproxy, but haproxy still fails to connect.
It gets very sporadic, ~10%, successes but that's rare enough for a health check not to clear. Since I have 8 RSS queues it is almost like haproxy only gets traffic from 1 queue which would amount to 12.5% success.
I've tried all combos of net.inet.rss.enable, noreuseport, with health checks, w/o health checks and success/failure depends completely on net.inet.rss.enable. The error reported from haproxy is "Layer4 timeout"
driver: ix
NIC: Intel D-1500 soc 10 gbe, (X552)
Opnsense: 22.1.7_1
I more than happy to help testing but would appreciate any suggestions in what direction to start.
Having done some more reasearch and Google-foo. I don't have a solution, but at least some more insight.
It seems BSD doesn't add RSS support for outgoing connections and that causes issues for HAProxy. I did find others having experienced the same issue with proposed patches for HAProxy but unfortunately those patches weren't accepted:
https://www.mail-archive.com/haproxy@formilux.org/msg34548.html (https://www.mail-archive.com/haproxy@formilux.org/msg34548.html)
https://lists.freebsd.org/pipermail/freebsd-transport/2019-June/000247.html (https://lists.freebsd.org/pipermail/freebsd-transport/2019-June/000247.html)
Both links have the same author. His thinking sounds sound, and his patches work for him, but my own 5 cents says that they rely on using a symmetric Toeplitz key even though he doesn't mention that. If my hunch is right, then the default Microsoft key doesn't work well. A symmetric key is also required for Suricata to ensure each thread sees both in and out flows of the same conversation. What he does is using the same RSS hash to assign outgoing ports to the same CPU and RSS queue as incoming traffic would hit, thereby not only assuring connection, also preventing CPU context switching even in HAProxy.
I tried limiting HAProxy to 1 process and 1 thread hoping that could work as a very quick, but performance limited, fix, but unfortunately not.
You could argue that solving this within HAProxy is not the right place as it intertwines the layers, but HAProxy RSS awereness also adds the prevention of CPU context switches between net.inet and HAProxy. His patches also hard codes the default hash key. Instead he should have asked the kernel for the key (I don't know if that's possible today) to make sure the same key is used. To satisfy Suricatas requirement, a wish would be for the possibility of setting a symmetric key.
We also did some tests on enabling RSS on about 15 KVM/Proxmox opnsense installations.
We used latest "Business-Edition" of opnsense Build for this.
It seems there are also still issues with nginx ReverseProxy which is also unable to connect to the backend servers when RSS is enabled.
Disabling RSS and reboot and nginx is working flawlessly again....
Tried out the RSS feature for a few days on our OPNsense instance (23.1.9-amd64) and while the performance is great, we've run into some weird issues with internal system services not being able to resolve DNS requests. Disabling RSS seems to fix the issue. We are using the Intel ice driver with a 25GbE SOC based NIC. Machine in question: https://www.supermicro.com/en/products/system/iot/1u/sys-110d-20c-fran8tp
Specifically, pkg update and the GeoIP feature in the firewall cannot connect, and I tracked this down to a DNS issue. When running pkg update via CLI or the GUI it hangs on the fetching process, and when running it with debug you can see its stuck at the resolving DNS stage. The GeoIP feature has similar issues.
Interestingly, if you use ping from CLI or use the DNS diagnostic tool, the system resolves DNS requests totally fine. I enabled debug on Unbound and it doesn't appear to even receive the requests from pkg update or GeoIP downloads. Would love to get this fixed so we can use RSS since it handles our 10G symmetrical connection a lot better.
user@kappa:/usr/local/etc # pkg -ddddddd update
DBG(1)[34160]> pkg initialized
Updating OPNsense repository catalogue...
DBG(1)[34160]> PkgRepo: verifying update for OPNsense
DBG(1)[34160]> PkgRepo: need forced update of OPNsense
DBG(1)[34160]> Pkgrepo, begin update of '/var/db/pkg/repo-OPNsense.sqlite'
DBG(1)[34160]> Request to fetch pkg+https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/meta.conf
DBG(1)[34160]> opening libfetch fetcher
DBG(1)[34160]> Fetch > libfetch: connecting
DBG(1)[34160]> Fetch: fetching from: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/meta.conf with opts "iv"
resolving server address: pkg.opnsense.org:443
^ hangs here for a while before retrying and effectively goes no where.
I can't reproduce a hang with a standard igb(4) here so it might be specific to ice(4). I'll try to monitor from my main install to see if this happens eventually or not.
Cheers,
Franco
PS: perhaps make sure Unbound does not run into issues as we know of some strange thing with RSS/SO_REUSEPORT combo. Switch to Dnsmasq to verify.
Quote from: sepahewe on May 23, 2022, 07:11:57 PM
Hi,
I tried enabling RSS and Suricata works. Better spread of CPU load and better performance. However, haproxy runs into issues. HAProxy can't connect to anything, not for health checks and not for live traffic. Based on earlier comment on so_reuseport, I changed my config to simple binds and enabled noreuseport for haproxy, but haproxy still fails to connect.
It gets very sporadic, ~10%, successes but that's rare enough for a health check not to clear. Since I have 8 RSS queues it is almost like haproxy only gets traffic from 1 queue which would amount to 12.5% success.
I have an X520 (ix) and that does not support RSS to my knowledge. running this will confirm:
sysctl dev.ix | grep rss
No results means driver/nic is unsupported, mine returns nothing.
I've tried all combos of net.inet.rss.enable, noreuseport, with health checks, w/o health checks and success/failure depends completely on net.inet.rss.enable. The error reported from haproxy is "Layer4 timeout"
driver: ix
NIC: Intel D-1500 soc 10 gbe, (X552)
Opnsense: 22.1.7_1
I more than happy to help testing but would appreciate any suggestions in what direction to start.
Hi,
I had some time to kill so I reran my tests. One difference I see is that with RSS enabled the firewall closes its own outgoing connections with an RST.
My firewall is 192.168.192.1 and in my test I ran curl http://192.168.192.30:8123 while capturing packets.
RSS disabled:
19 2.395679 192.168.192.1 192.168.192.30 TCP 74 52726 → 8123 [SYN] Seq=0 Win=65228 Len=0 MSS=1460 WS=128 SACK_PERM TSval=2821030256 TSecr=0
20 2.395947 192.168.192.30 192.168.192.1 TCP 74 8123 → 52726 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM TSval=2913189312 TSecr=2821030256 WS=128
21 2.396029 192.168.192.1 192.168.192.30 TCP 66 52726 → 8123 [ACK] Seq=1 Ack=1 Win=65792 Len=0 TSval=2821030256 TSecr=2913189312
22 2.396311 192.168.192.1 192.168.192.30 HTTP 148 GET / HTTP/1.1
RSS enabled:
68 24.248066 192.168.192.1 192.168.192.30 TCP 74 19224 → 8123 [SYN] Seq=0 Win=65228 Len=0 MSS=1460 WS=128 SACK_PERM TSval=187982256 TSecr=0
69 24.248327 192.168.192.30 192.168.192.1 TCP 74 8123 → 19224 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM TSval=2911919337 TSecr=187982256 WS=128
70 24.248375 192.168.192.1 192.168.192.30 TCP 66 19224 → 8123 [ACK] Seq=1 Ack=1 Win=65792 Len=0 TSval=187982256 TSecr=2911919337
71 24.248517 192.168.192.1 192.168.192.30 TCP 66 19224 → 8123 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 TSval=187982257 TSecr=2911919337
With RSS disabled curl works, TCP is established.
With RSS enabled, the firewall itself kills TCP with an RST. I don't understand why...
Franco,
Using the latest 23.3.1_3 release and a set of Mellanox Connectx-3 NICs. Netstat -Q reports:
Configuration:
Setting Current Limit
Thread count 12 12
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Using an Intel i5 -10400 6 Cores, 12 threads so set "net.inet.rss.bits = 2"
Had to disable RSS as DNS resolving was taking upwards of 30 seconds to connect to any website. I am using Unbound and Crowdsec at the moment.
Thanks.
Pat
What happens if you disable Crowdsec instead ?
newsense,
That did the trick, thank you!, but bummer on CROWDSEC as I liked the low overhead, etc. Did not originally think it would be an issue either as I thought it was focused on inbound IPs and not outbound requests....
Started looking to see if there is a setting for CROWDSEC that will work, or report to them as the case evolves.
Best,
Pat
No bummer, Crowdsec needs to be set up properly else it will start blocking on everything it triggers
https://docs.crowdsec.net/docs/whitelist/create/ (https://docs.crowdsec.net/docs/whitelist/create/)
Well, did not install Crowdsec again and do white listing as I just noticed System Firmware just spins away with RSS running:
2023-08-21T19:03:10-04:00 Error configd.py Timeout (120) executing : firmware tiers
Reverted back to system without RSS and the whole firmware section works fine now...
Hello everybody, I have been following this thread with great interest, since I have been experimenting with RSS as well. This is my experience so far.
SETUP
- OPNSense 23.7.3 & all packages up to date
- Intel Pentium Silver N6005 (4 cores, 4 threads) with AES-NI hardware acceleration
Cryptography acceleration set to Intel QAT (QuickAssist Technology) (System --> Settings --> Miscellaneous) see newsense post (https://forum.opnsense.org/index.php?topic=24409.msg174056#msg174056)- i226-V with igc driver
- OPNSense tunables set like this:
- net.isr.bindthreads = 1
- net.isr.maxthreads = -1
- net.inet.rss.enabled = 1
- net.inet.rss.bits = 2
- ZenArmor installed wiith native Netmap driver and SQLite db (2 days of log + max 100 devices)
- CrowdSec with defaults
- Unbound set to fully recursive
netstat -Q output shows:
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
I have been running my OPNSense rig like this for the last 4 months with absolutely zero problems and still I have to run into some significant issue, except for a small hiccup with ZenArmor (https://forum.opnsense.org/index.php?topic=34305.msg174040#msg174040)that I recently solved thanks to their support. In this thread, I initially asked for ways to test my setup, so if anyone has some idea I'll be more than willing to give it a spin.
Thanks for all the useful information.
You have no QAT on that CPU, either Xeon D or Atom C and P series are QAT capable -- or a dedicated card.
https://ark.intel.com/content/www/us/en/ark/products/212327/intel-pentium-silver-n6005-processor-4m-cache-up-to-3-30-ghz.html (https://ark.intel.com/content/www/us/en/ark/products/212327/intel-pentium-silver-n6005-processor-4m-cache-up-to-3-30-ghz.html)
https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html (https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html)
Quote from: newsense on September 04, 2023, 11:41:00 PM
You have no QAT on that CPU, either Xeon D or Atom C and P series are QAT capable -- or a dedicated card.
https://ark.intel.com/content/www/us/en/ark/products/212327/intel-pentium-silver-n6005-processor-4m-cache-up-to-3-30-ghz.html (https://ark.intel.com/content/www/us/en/ark/products/212327/intel-pentium-silver-n6005-processor-4m-cache-up-to-3-30-ghz.html)
https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html (https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html)
thanks for noticing, I overlooked this detail. Among the four options available in System --> Settings --> Miscellaneous --> Cryptography settings --> Hardware acceleration I picked the Intel one since I did not recognize the other options. What is an alternative value for my CPU? there is no disable option, so not sure which one to pick.
EDIT: nevermind, found the answer:
If you do not have a crypto chip in your system, this option will have no effect. so no harm done.
Enabled RSS, working beautifully! CrowdSec and Unbound seems to be working fine, in contrary to some people in this thread.
I will continue monitoring for weird behavior. Is there anything I should look for in particular?
OPNSense 23.7.3 & all packages up to date
Celeron J4125 with 6x i225-V B3 NIC's.
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
hi, how can I install dynamic dns via cmd in the latest update?
unfortunately, it doesn't work on the gui interface :(
sorry, I have it but I can't find the setting below -> net.inet.rss.enabled
sorry, I'm still a beginner :(
Hi,
having had a few posts in this thread, here's how I resolved my issues.
TLDR; RTFM and the mlx4 driver doesn't support RSS on FreeBSD.
I have two dual port cards that are involved: Intel x552 (driver ixgbe) and Mellanox ConnectX-3 Pro (driver mlx4). When I replaced ConnectX-3 with a ConnectX-5 (driver mlx5), everything works.
What fooled me is when reading up on capabilities on mlx4, information provided, even by Mellanox, isn't always clear if it refers to Linux or BSD so I got the understanding it should work. The NIC supports RSS, mlx4 on Linux supports RSS, however on FreeBSD mlx4 does not support RSS.
mlx5 supports RSS on FreeBSD, so changing NIC to ConnectX-5 solved it.
Hi,
I am using HP T740 (AMD Ryzen Embedded V1756B with Radeon Vega Gfx (4 cores, 8 threads)) with Intel X710 NIC. I enabled rss. In the netstat -Q ouput, the QDrops for IP6 are increasing slowly. I already tried to increase net.isr.defaultqlimit from 2048 to 8192. That still did not increase the QLimits for IP6.
Find the tunables to adjust them,
net.inet.ip.intr_queue_maxlen
net.inet6.ip6.intr_queue_maxlen
root@opnsense:/home/root # netstat -Q
Configuration:
Setting Current Limit
Thread count 8 8
Default queue limit 8192 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 3000 cpu hybrid C--
igmp 2 8192 source default ---
rtsock 3 8192 source default ---
arp 4 8192 source default ---
ether 5 8192 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 8192 cpu hybrid C--
ip6_direct 10 8192 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 12 0 499 0 3039 3538
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 501 0 0 0 501
0 0 ether 0 0 2271227 0 0 0 2271227
0 0 ip6 0 1000 0 93596 202 1083995 1177591
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 10 0 375 0 5571 5946
1 1 igmp 0 0 2 0 0 0 2
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 0 0 0 0 0 0
1 1 ether 0 0 865820 0 0 0 865820
1 1 ip6 0 1000 0 1071 517 1508206 1509277
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 7 0 425 0 1902 2327
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 0 0 0 0 0 0
2 2 ether 0 0 3891441 0 0 0 3891441
2 2 ip6 0 1000 0 89837 154 786449 876286
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 49 0 117 0 30043 30160
3 3 igmp 0 0 1 0 0 0 1
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 0 0 0 0 0 0
3 3 ether 0 0 2731942 0 0 0 2731942
3 3 ip6 0 1000 0 570 160 355308 355878
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
4 4 ip 0 10 0 273 0 1897 2170
4 4 igmp 0 0 0 0 0 0 0
4 4 rtsock 0 2 0 0 0 115 115
4 4 arp 0 0 0 0 0 0 0
4 4 ether 0 0 597119 0 0 0 597119
4 4 ip6 0 1000 0 66457 1360 988018 1054475
4 4 ip_direct 0 0 0 0 0 0 0
4 4 ip6_direct 0 0 0 0 0 0 0
5 5 ip 0 33 0 671 0 6113 6784
5 5 igmp 0 0 3 0 0 0 3
5 5 rtsock 0 0 0 0 0 0 0
5 5 arp 0 0 0 0 0 0 0
5 5 ether 0 0 626430 0 0 0 626430
5 5 ip6 0 1000 0 1430 1226 1361493 1362923
5 5 ip_direct 0 0 0 0 0 0 0
5 5 ip6_direct 0 0 0 0 0 0 0
6 6 ip 0 59 0 331 0 4234 4565
6 6 igmp 0 0 0 0 0 0 0
6 6 rtsock 0 0 0 0 0 0 0
6 6 arp 0 0 0 0 0 0 0
6 6 ether 0 0 725274 0 0 0 725274
6 6 ip6 0 1000 0 94895 3162 1398256 1493151
6 6 ip_direct 0 0 0 0 0 0 0
6 6 ip6_direct 0 0 0 0 0 0 0
7 7 ip 0 25 0 644 0 8798 9442
7 7 igmp 0 0 0 0 0 0 0
7 7 rtsock 0 0 0 0 0 0 0
7 7 arp 0 0 0 0 0 0 0
7 7 ether 0 0 445253 0 0 0 445253
7 7 ip6 0 936 0 236 0 909138 909374
7 7 ip_direct 0 0 0 0 0 0 0
7 7 ip6_direct 0 0 0 0 0 0 0
root@opnsense:/home/root # sysctl -a | grep isr
net.route.netisr_maxqlen: 8192
net.isr.numthreads: 8
net.isr.maxprot: 16
net.isr.defaultqlimit: 8192
net.isr.maxqlimit: 10240
net.isr.bindthreads: 1
net.isr.maxthreads: 8
net.isr.dispatch: direct
We tested RSS on a slow PC Engines APU2 device in combination with an IPsec Site to Site VPN.
With enabled RSS, there are some issues with unbound receiving DNS packets through the VPN tunnel:
- unbound has a DNS override for a specific domain, the IP of the authorative server for that override is set to an IP inside the IPsec P2 remote network.
- The outgoing network interface in unbound is set to the interface, the IPsec P2 local network resides in.
With a packet capture we can see the DNS answer packets arriving at the IPsec P2 local network, but unbound does not see them.
With disabled RSS this setup is working without any issues.
Anyone with numbers? I haven't seen performance difference. I actually tried every possible tuning value for igc NI:C, incl.RSS and there was no difference. RSS caused more dropped packets. The only setting that was slightly impactful was Interface\Disable * and Firewall\Optimization.
Finally had some time to play with it
Setup:
OPNsense 24.1.2_1-amd64
Intel Xeon D-2733NT
32 Gb Ram
I was testing only my 2 25Gb/s ports so far, NIC are "ICE" and they are based on intel e823-c
Ubuntu VM with PCI-e pass through of mellanox connectx-4 25Gb ->ice0 /Opnsense\ ice1-> Windows 11 with mellanox conenctx-4 25Gb
MTU 1500
Performance numbers based on iperf3
Performance Before:
With Zenaromor: 2.7Gb/s (2 cores loaded to max)
Without Zenarmor: 5Gb/s (1 core loaded to max)
Without Zenarmor few streams (different ports): 7Gb/s (1 core loaded to max)
Performance after:
With Zenaromor: 3.9Gb/s (3 cores loaded to 70-80%)
Without Zenarmor: 5.2Gb/s (1 core loaded to ~80%)
Without Zenarmor few streams (different ports): 8Gb/s (1 core loaded to max)
It definitely helped with zenarmor but I expected better performance without zenarmor
root@opr01:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 16 16
Default queue limit 256 10240
Dispatch policy hybrid n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 481 0 4292176 0 25295849 29588025
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 1 0 0 0 29 29
0 0 ether 0 0 27531623 0 0 0 27531623
0 0 ip6 0 3 0 0 0 13344 13344
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 36 0 4 0 1070042 1070046
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 1 0 0 0 8 8
1 1 ether 0 0 14 0 0 0 14
1 1 ip6 0 2 0 0 0 1125 1125
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 200 0 1 0 9914864 9914865
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 0 0 0 0 0 0
2 2 ether 0 0 959020 0 0 0 959020
2 2 ip6 0 2 0 0 0 1225 1225
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 7 0 0 0 428639 428639
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 0 0 0 0 0 0
3 3 arp 0 1 0 0 0 7 7
3 3 ether 0 0 0 0 0 0 0
3 3 ip6 0 3 0 0 0 1117 1117
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
4 4 ip 0 8 0 5545 0 45152 50697
4 4 igmp 0 0 0 0 0 0 0
4 4 rtsock 0 0 0 0 0 0 0
4 4 arp 0 0 0 0 0 0 0
4 4 ether 0 0 389516 0 0 0 389516
4 4 ip6 0 3 0 0 0 1152 1152
4 4 ip_direct 0 0 0 0 0 0 0
4 4 ip6_direct 0 0 0 0 0 0 0
5 5 ip 0 136 0 2229 0 35988 38217
5 5 igmp 0 0 0 0 0 0 0
5 5 rtsock 0 0 0 0 0 0 0
5 5 arp 0 1 0 0 0 27 27
5 5 ether 0 0 12837078 0 0 0 12837078
5 5 ip6 0 3 0 0 0 1155 1155
5 5 ip_direct 0 0 0 0 0 0 0
5 5 ip6_direct 0 0 0 0 0 0 0
6 6 ip 0 165 0 133686 0 1893259 2026945
6 6 igmp 0 0 0 0 0 0 0
6 6 rtsock 0 0 0 0 0 0 0
6 6 arp 0 0 0 0 0 0 0
6 6 ether 0 0 379341 0 0 0 379341
6 6 ip6 0 2 0 0 0 1175 1175
6 6 ip_direct 0 0 0 0 0 0 0
6 6 ip6_direct 0 0 0 0 0 0 0
7 7 ip 0 448 0 29 0 224705 224734
7 7 igmp 0 0 0 0 0 0 0
7 7 rtsock 0 0 0 0 0 0 0
7 7 arp 0 1 0 0 0 26 26
7 7 ether 0 0 295 0 0 0 295
7 7 ip6 0 2 0 0 0 1168 1168
7 7 ip_direct 0 0 0 0 0 0 0
7 7 ip6_direct 0 0 0 0 0 0 0
8 8 ip 0 0 0 0 0 0 0
8 8 igmp 0 0 0 0 0 0 0
8 8 rtsock 0 0 0 0 0 0 0
8 8 arp 0 1 0 0 0 4 4
8 8 ether 0 0 469 0 0 0 469
8 8 ip6 0 0 0 0 0 0 0
8 8 ip_direct 0 0 0 0 0 0 0
8 8 ip6_direct 0 0 0 0 0 0 0
9 9 ip 0 0 0 0 0 0 0
9 9 igmp 0 0 0 0 0 0 0
9 9 rtsock 0 2 0 0 0 279 279
9 9 arp 0 0 0 0 0 0 0
9 9 ether 0 0 901 0 0 0 901
9 9 ip6 0 0 0 0 0 0 0
9 9 ip_direct 0 0 0 0 0 0 0
9 9 ip6_direct 0 0 0 0 0 0 0
10 10 ip 0 0 0 0 0 0 0
10 10 igmp 0 0 0 0 0 0 0
10 10 rtsock 0 0 0 0 0 0 0
10 10 arp 0 0 0 0 0 0 0
10 10 ether 0 0 5235 0 0 0 5235
10 10 ip6 0 0 0 0 0 0 0
10 10 ip_direct 0 0 0 0 0 0 0
10 10 ip6_direct 0 0 0 0 0 0 0
11 11 ip 0 0 0 0 0 0 0
11 11 igmp 0 0 0 0 0 0 0
11 11 rtsock 0 0 0 0 0 0 0
11 11 arp 0 0 0 0 0 0 0
11 11 ether 0 0 0 0 0 0 0
11 11 ip6 0 0 0 0 0 0 0
11 11 ip_direct 0 0 0 0 0 0 0
11 11 ip6_direct 0 0 0 0 0 0 0
12 12 ip 0 0 0 0 0 0 0
12 12 igmp 0 0 0 0 0 0 0
12 12 rtsock 0 0 0 0 0 0 0
12 12 arp 0 0 0 0 0 0 0
12 12 ether 0 0 1058473 0 0 0 1058473
12 12 ip6 0 0 0 0 0 0 0
12 12 ip_direct 0 0 0 0 0 0 0
12 12 ip6_direct 0 0 0 0 0 0 0
13 13 ip 0 0 0 0 0 0 0
13 13 igmp 0 0 0 0 0 0 0
13 13 rtsock 0 0 0 0 0 0 0
13 13 arp 0 0 0 0 0 0 0
13 13 ether 0 0 0 0 0 0 0
13 13 ip6 0 0 0 0 0 0 0
13 13 ip_direct 0 0 0 0 0 0 0
13 13 ip6_direct 0 0 0 0 0 0 0
14 14 ip 0 0 0 0 0 0 0
14 14 igmp 0 0 0 0 0 0 0
14 14 rtsock 0 0 0 0 0 0 0
14 14 arp 0 1 0 0 0 8 8
14 14 ether 0 0 5734 0 0 0 5734
14 14 ip6 0 0 0 0 0 0 0
14 14 ip_direct 0 0 0 0 0 0 0
14 14 ip6_direct 0 0 0 0 0 0 0
15 15 ip 0 0 0 0 0 0 0
15 15 igmp 0 0 0 0 0 0 0
15 15 rtsock 0 0 0 0 0 0 0
15 15 arp 0 0 0 0 0 0 0
15 15 ether 0 0 0 0 0 0 0
15 15 ip6 0 0 0 0 0 0 0
15 15 ip_direct 0 0 0 0 0 0 0
15 15 ip6_direct 0 0 0 0 0 0 0
I have a somewhat similar setup with a Mellanox Connectx4-lx passed through from a proxmox host to opnsensense on an amd ryzen 7 7745hx system. I have the LAN interface attached to a 25Gb port on my switch but my LAN clients are only 10Gb. I'd be happy to run any iperf tests that may be helpful for you but my WAN link is only 2Gb/1Gb. Inter-vlan routing through the firewall does reach 9-10Gb with iperf3 without the need for jumbo frames.
One quick question for you. You seem to be the only result for someone with a connectx4 running zenarmor/netmap on freebsd/opnsense. This used to be completely impossible if older search results are to be believed due to netmap compatibility issues? I'm guessing netmap is functional on your machine with the connectx4? I have not tried it yet and was unaware of this potential issue.
(https://i.ibb.co/m6bvgNt/Windows-Terminal-LKY4-O7e-BWG.png)
@routetastic I think you misunderstood my post, I have mellanox cards in my desktop and one server, but the second server that is running opnsense doesn't have mellanox card in it, it has 2x25gb ports served by intel e823-c.
If we are talking performance, whats you cpu usage (and specific cores usage and how many cores are used) when you push 10gb?
Even tough this is an old topic for EoL Version I will post here for completion.
Tested on my current test setup:
N100
i226-V (igc)
ZenArmor with Native netmap driver
LAN 2x1G LAGG
netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 2048 10240
Dispatch policy hybrid n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 2048 source default ---
rtsock 3 2048 source default ---
arp 4 2048 source default ---
ether 5 2048 cpu direct C--
ip6 6 1000 cpu hybrid C--
ip_direct 9 2048 cpu hybrid C--
ip6_direct 10 2048 cpu hybrid C--
Tried RSS as well together with Zenarmor, and one observation.
In order for Zenarmor to perform better you need to click
Do not pin engine packet processors to dedicated CPU cores option
Regards,
S.
Quote from: pata on February 10, 2022, 10:20:29 AM
Hi,
I'm shooting in the dark here. I activated RSS yesterday on my system with AMD Ryzen CPU. Since the multi-threading technology used by AMD is not the same as Intels hyper-threading should the net.inet.rss.bits be per core or per number of threads for AMD? If I understand it correctly AMD can write simultaneously to the core from the threads and are not limited like Intels hyper-threading?
Any expert out there who knows? Thanks!
I would also love to know the answer to this..
I just ran into this issue with nginx on 24.1 after enabling rss on a Xeon D-1747NTE. Then I upgraded to Opnsense 24.7 and tried it again. It works so far. Maybe they made some changes in FreeBSD 14.1
Can anybody else confirm this behaviour?
Quote from: jakkuh on June 27, 2023, 08:16:22 PM
Tried out the RSS feature for a few days on our OPNsense instance (23.1.9-amd64) and while the performance is great, we've run into some weird issues with internal system services not being able to resolve DNS requests. Disabling RSS seems to fix the issue. We are using the Intel ice driver with a 25GbE SOC based NIC. Machine in question: https://www.supermicro.com/en/products/system/iot/1u/sys-110d-20c-fran8tp
Specifically, pkg update and the GeoIP feature in the firewall cannot connect, and I tracked this down to a DNS issue. When running pkg update via CLI or the GUI it hangs on the fetching process, and when running it with debug you can see its stuck at the resolving DNS stage. The GeoIP feature has similar issues.
Interestingly, if you use ping from CLI or use the DNS diagnostic tool, the system resolves DNS requests totally fine. I enabled debug on Unbound and it doesn't appear to even receive the requests from pkg update or GeoIP downloads. Would love to get this fixed so we can use RSS since it handles our 10G symmetrical connection a lot better.
user@kappa:/usr/local/etc # pkg -ddddddd update
DBG(1)[34160]> pkg initialized
Updating OPNsense repository catalogue...
DBG(1)[34160]> PkgRepo: verifying update for OPNsense
DBG(1)[34160]> PkgRepo: need forced update of OPNsense
DBG(1)[34160]> Pkgrepo, begin update of '/var/db/pkg/repo-OPNsense.sqlite'
DBG(1)[34160]> Request to fetch pkg+https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/meta.conf
DBG(1)[34160]> opening libfetch fetcher
DBG(1)[34160]> Fetch > libfetch: connecting
DBG(1)[34160]> Fetch: fetching from: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/meta.conf with opts "iv"
resolving server address: pkg.opnsense.org:443
^ hangs here for a while before retrying and effectively goes no where.