[Tutorial/Call for Testing] Enabling Receive Side Scaling on OPNsense

Started by tuto2, August 16, 2021, 02:13:24 PM

Previous topic - Next topic
PS: perhaps make sure Unbound does not run into issues as we know of some strange thing with RSS/SO_REUSEPORT combo. Switch to Dnsmasq to verify.

Quote from: sepahewe on May 23, 2022, 07:11:57 PM
Hi,

I tried enabling RSS and Suricata works. Better spread of CPU load and better performance. However, haproxy runs into issues. HAProxy can't connect to anything, not for health checks and not for live traffic. Based on earlier comment on so_reuseport, I changed my config to simple binds and enabled noreuseport for haproxy, but haproxy still fails to connect.

It gets very sporadic, ~10%, successes but that's rare enough for a health check not to clear. Since I have 8 RSS queues it is almost like haproxy only gets traffic from 1 queue which would amount to 12.5% success.

I have an X520 (ix) and that does not support RSS to my knowledge.  running this will confirm:

sysctl dev.ix | grep rss

No results means driver/nic is unsupported, mine returns nothing.

I've tried all combos of net.inet.rss.enable, noreuseport, with health checks, w/o health checks and success/failure depends completely on net.inet.rss.enable. The error reported from haproxy is "Layer4 timeout"

driver: ix
NIC: Intel D-1500 soc 10 gbe, (X552)
Opnsense: 22.1.7_1

I more than happy to help testing but would appreciate any suggestions in what direction to start.

Hi,

I had some time to kill so I reran my tests. One difference I see is that with RSS enabled the firewall closes its own outgoing connections with an RST.

My firewall is 192.168.192.1 and in my test I ran curl http://192.168.192.30:8123 while capturing packets.

RSS disabled:
19 2.395679 192.168.192.1 192.168.192.30 TCP 74 52726 → 8123 [SYN] Seq=0 Win=65228 Len=0 MSS=1460 WS=128 SACK_PERM TSval=2821030256 TSecr=0
20 2.395947 192.168.192.30 192.168.192.1 TCP 74 8123 → 52726 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM TSval=2913189312 TSecr=2821030256 WS=128
21 2.396029 192.168.192.1 192.168.192.30 TCP 66 52726 → 8123 [ACK] Seq=1 Ack=1 Win=65792 Len=0 TSval=2821030256 TSecr=2913189312
22 2.396311 192.168.192.1 192.168.192.30 HTTP 148 GET / HTTP/1.1


RSS enabled:
68 24.248066 192.168.192.1 192.168.192.30 TCP 74 19224 → 8123 [SYN] Seq=0 Win=65228 Len=0 MSS=1460 WS=128 SACK_PERM TSval=187982256 TSecr=0
69 24.248327 192.168.192.30 192.168.192.1 TCP 74 8123 → 19224 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM TSval=2911919337 TSecr=187982256 WS=128
70 24.248375 192.168.192.1 192.168.192.30 TCP 66 19224 → 8123 [ACK] Seq=1 Ack=1 Win=65792 Len=0 TSval=187982256 TSecr=2911919337
71 24.248517 192.168.192.1 192.168.192.30 TCP 66 19224 → 8123 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 TSval=187982257 TSecr=2911919337


With RSS disabled curl works, TCP is established.

With RSS enabled, the firewall itself kills TCP with an RST. I don't understand why...

Franco,

  Using the latest 23.3.1_3 release and a set of Mellanox Connectx-3 NICs.  Netstat -Q reports:

Configuration:
Setting                        Current        Limit
Thread count                        12           12
Default queue limit                256        10240
Dispatch policy                     direct       n/a
Threads bound to CPUs       enabled   n/a

Using an Intel i5 -10400 6 Cores, 12 threads so set "net.inet.rss.bits = 2"

Had to disable RSS as DNS resolving was taking upwards of 30 seconds to connect to any website.  I am using Unbound and Crowdsec at the moment.

Thanks.

Pat


newsense,

  That did the trick, thank you!, but bummer on CROWDSEC as I liked the low overhead, etc.  Did not originally think it would be an issue either as I thought it was focused on inbound IPs and not outbound requests....

  Started looking to see if there is a setting for CROWDSEC that will work, or report to them as the case evolves.

Best,

Pat

No bummer, Crowdsec needs to be set up properly else it will start blocking on everything it triggers


https://docs.crowdsec.net/docs/whitelist/create/

Well, did not install Crowdsec again and do white listing as I just noticed System Firmware just spins away with RSS running:

2023-08-21T19:03:10-04:00   Error   configd.py   Timeout (120) executing : firmware tiers

Reverted back to system without RSS and the whole firmware section works fine now...

Hello everybody, I have been following this thread with great interest, since I have been experimenting with RSS as well. This is my experience so far.

SETUP

  • OPNSense 23.7.3 & all packages up to date
  • Intel Pentium Silver N6005 (4 cores, 4 threads) with AES-NI hardware acceleration
  • Cryptography acceleration set to Intel QAT (QuickAssist Technology) (System --> Settings --> Miscellaneous) see newsense post
  • i226-V with igc driver
  • OPNSense tunables set like this:

    • net.isr.bindthreads = 1
    • net.isr.maxthreads = -1
    • net.inet.rss.enabled = 1
    • net.inet.rss.bits = 2
  • ZenArmor installed wiith native Netmap driver and SQLite db (2 days of log + max 100 devices)
  • CrowdSec with defaults
  • Unbound set to fully recursive

netstat -Q output shows:
Configuration:
Setting                        Current        Limit
Thread count                         4            4
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs          enabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   1000    cpu   hybrid   C--
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256    cpu   direct   C--
ip6        6   1000    cpu   hybrid   C--
ip_direct     9    256    cpu   hybrid   C--
ip6_direct    10    256    cpu   hybrid   C--


I have been running my OPNSense rig like this for the last 4 months with absolutely zero problems and still I have to run into some significant issue, except for a small hiccup with ZenArmor that I recently solved thanks to their support. In this thread, I initially asked for ways to test my setup, so if anyone has some idea I'll be more than willing to give it a spin.

Thanks for all the useful information.
HUNSN RJ03m (Intel N6005 | 32GB DDR4 | Intel 2.5GbE I226-V)
OPNSense 23.7.3
AdGuard Home | CrowdSec | ZenArmor | Unbound | DNS over VPN


Quote from: newsense on September 04, 2023, 11:41:00 PM
You have no QAT on that CPU, either Xeon D or Atom C and P series are QAT capable -- or a dedicated card.

https://ark.intel.com/content/www/us/en/ark/products/212327/intel-pentium-silver-n6005-processor-4m-cache-up-to-3-30-ghz.html

https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html

thanks for noticing, I overlooked this detail. Among the four options available in System --> Settings --> Miscellaneous --> Cryptography settings -->  Hardware acceleration I picked the Intel one since I did not recognize the other options. What is an alternative value for my CPU? there is no disable option, so not sure which one to pick.

EDIT: nevermind, found the answer: If you do not have a crypto chip in your system, this option will have no effect. so no harm done.
HUNSN RJ03m (Intel N6005 | 32GB DDR4 | Intel 2.5GbE I226-V)
OPNSense 23.7.3
AdGuard Home | CrowdSec | ZenArmor | Unbound | DNS over VPN

Enabled RSS, working beautifully! CrowdSec and Unbound seems to be working fine, in contrary to some people in this thread.

I will continue monitoring for weird behavior. Is there anything I should look for in particular?

OPNSense 23.7.3 & all packages up to date
Celeron J4125 with 6x i225-V B3 NIC's.

Configuration:
Setting                        Current        Limit
Thread count                         4            4
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs          enabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   1000    cpu   hybrid   C--
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256    cpu   direct   C--
ip6        6   1000    cpu   hybrid   C--
ip_direct     9    256    cpu   hybrid   C--
ip6_direct    10    256    cpu   hybrid   C--


hi, how can I install dynamic dns via cmd in the latest update?
unfortunately, it doesn't work on the gui interface  :(

sorry, I have it but I can't find the setting below -> net.inet.rss.enabled
sorry, I'm still a beginner  :(

Hi,

having had a few posts in this thread, here's how I resolved my issues.

TLDR; RTFM and the mlx4 driver doesn't support RSS on FreeBSD.

I have two dual port cards that are involved: Intel x552 (driver ixgbe) and Mellanox ConnectX-3 Pro (driver mlx4). When I replaced ConnectX-3 with a ConnectX-5 (driver mlx5), everything works.

What fooled me is when reading up on capabilities on mlx4, information provided, even by Mellanox, isn't always clear if it refers to Linux or BSD so I got the understanding it should work. The NIC supports RSS, mlx4 on Linux supports RSS, however on FreeBSD mlx4 does not support RSS.

mlx5 supports RSS on FreeBSD, so changing NIC to ConnectX-5 solved it.