[Tutorial/Call for Testing] Enabling Receive Side Scaling on OPNsense

Started by tuto2, August 16, 2021, 02:13:24 PM

Previous topic - Next topic
Unbound so-reuseport is buggy with RSS enabled so we removed it to avoid further problems. It might jus be that the outcome is the same speed wise either with so-reuseport disabled or RSS enabled mutually exclusive.

We will be looking into it, but with the beta just out it's better to concentrate on more urgent issues.


Cheers,
Franco

Quote from: Koldnitz on November 13, 2021, 08:16:51 PM
Has anyone noticed significant (a full magnitude or greater slow down) on the average recursion time in unbound after enabling this option?

I tried with the latest stable build 21.7.5.

Cheers,

it does seem more sluggish, recursion time has gone up. could also be other factors. i'm running 21.7.6

Quote from: franco on October 26, 2021, 08:40:52 AM
Linux does have RPS to deal with PPPoE acceleration but FreeBSD has no equivalent.

Franco, thanks for clearing this up. Is there any discussion going on about implementing RPS (or RFS: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/network-rfs) in FreeBSD? Is someone already working on it?

PPPoE is heavily used by many ISPs, at least here in Italy (and I think in Europe in general), it would be nice to have these kind of optimizations also for these kind of connections.

UPDATE: I just found out in this post on freebsd-net mailing list that it has been implemented as "option RSS". Is that correct?

Thanks a lot,

Alessandro

I have to disagree with Alexander's conclusion in that mailing list. I mean we talk about RSS here and already established it's not RPS. Note that the mailing list question doesn't say anything about PPPoE use case either which makes RSS = RPS assumtion slightly less wrong, but it still mostly is.

I'm not having any hopes for RPS/PPPoE inside FreeBSD. The state is either deemed good enough by the developers or it's not a hindrance for the bulk of FreeBSD consumers (not users).


Cheers,
Franco

The post I linked was from a user that needed RPS on FreeBSD, not specifically for PPPoE but for another use-case. The answer of Alexander was pretty clear, that's why I was surprised and reported it here, to have a confirmation from you. :)

Right now I'm testing RSS as per instructions in 1st post, and as far as I can see, it's not using only core 0 as I read (for pppoe). What should I monitor specifically to check if it is spreading the workload or sticking to the first core? I'm checking with netstat -q:


Configuration:
Setting                        Current        Limit
Thread count                         4            4
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs          enabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   1000    cpu   hybrid   C--
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256    cpu   direct   C--
ip6        6    256    cpu   hybrid   C--
ip_direct     9    256    cpu   hybrid   C--
ip6_direct    10    256    cpu   hybrid   C--

Workstreams:
WSID CPU   Name     Len WMark   Disp'd  HDisp'd   QDrops   Queued  Handled
   0   0   ip         0    10        0   533094        0    11564   544658
   0   0   igmp       0     0        2        0        0        0        2
   0   0   rtsock     0     2        0        0        0       36       36
   0   0   arp        0     0     1625        0        0        0     1625
   0   0   ether      0     0  2350239        0        0        0  2350239
   0   0   ip6        0     0        0       14        0        0       14
   0   0   ip_direct     0     0        0        0        0        0        0
   0   0   ip6_direct     0     0        0        0        0        0        0
   1   1   ip         0    11        0        0        0   335277   335277
   1   1   igmp       0     0        0        0        0        0        0
   1   1   rtsock     0     0        0        0        0        0        0
   1   1   arp        0     0        0        0        0        0        0
   1   1   ether      0     0        0        0        0        0        0
   1   1   ip6        0     1        0        0        0        8        8
   1   1   ip_direct     0     0        0        0        0        0        0
   1   1   ip6_direct     0     0        0        0        0        0        0
   2   2   ip         0    14        0     1235        0   478622   479857
   2   2   igmp       0     0        0        0        0        0        0
   2   2   rtsock     0     0        0        0        0        0        0
   2   2   arp        0     0        0        0        0        0        0
   2   2   ether      0     0   333485        0        0        0   333485
   2   2   ip6        0     1        0        0        0        1        1
   2   2   ip_direct     0     0        0        0        0        0        0
   2   2   ip6_direct     0     0        0        0        0        0        0
   3   3   ip         0    13        0        0        0   475546   475546
   3   3   igmp       0     0        0        0        0        0        0
   3   3   rtsock     0     0        0        0        0        0        0
   3   3   arp        0     0        0        0        0        0        0
   3   3   ether      0     0        0        0        0        0        0
   3   3   ip6        0     1        0        0        0        1        1
   3   3   ip_direct     0     0        0        0        0        0        0
   3   3   ip6_direct     0     0        0        0        0        0        0


I have a 1000/300 FTTH connection, here's a quick test with RSS enabled:


speedtest -s 4302                                                                                                                                                                                                                                                       
   Speedtest by Ookla

     Server: Vodafone IT - Milan (id = 4302)
        ISP: Tecno General S.r.l
    Latency:     8.26 ms   (0.16 ms jitter)
   Download:   937.00 Mbps (data used: 963.1 MB)
     Upload:   281.64 Mbps (data used: 141.4 MB)
Packet Loss: Not available.
Result URL: https://www.speedtest.net/result/c/0e691806-5212-4fc3-b199-2b2e92660367


Thanks for the support.

I assume that is just the topic being lost in translation.

The point is: RSS works on incoming IP packets, mostly done in hardware. PPPoE is incoming in non-IP crossing over the same hardware. RSS doesn't work here. And pre-decapsulation RSS can't be applied. If you apply it post-decapsulation it's called RPS. And we don't have RPS.


Cheers,
Franco

Quote from: Tupsi on November 13, 2021, 05:01:42 PM
I wanted to thank you guys for getting this into the 21.7.x release lately. After doing the works with the mentioned tunables, I know finally get my full internet I currently should have (1000/500). Before these changes the download side throttled around 500, so I ended up with a 500/500 line.

so THANK YOU!

Seems to make a big difference to me as well - very well done - I am just using a lowly Qotom J1900 box, which has probably been in use for 6-7 years now.

I was just about to pull the trigger on an upgraded box, as I have just upgraded to Gig broadband, but I was seeing 'kernel{if_io_tqg' pegged to 100% at around 700-750Mbps - less, 400-500Mbps, with the shaper enabled.

I have the following set using 21.7.6

hw.pci.enable_msix="1"
machdep.hyperthreading_allowed="0"
hw.em.rx_process_limit="-1"
net.link.ifqmaxlen="8192"
net.isr.numthreads=4
net.isr.defaultqlimit=4096
net.isr.bindthreads=1
net.isr.maxthreads=4
net.inet.rss.enabled=1
net.inet.rss.bits=2
dev.em.3.iflib.override_nrxds="4096"
dev.em.3.iflib.override_ntxds="4096"
dev.em.3.iflib.override_qs_enable="1"
dev.em.3.iflib.override_nrxqs="4"
dev.em.3.iflib.override_ntxqs="4"
dev.em.2.iflib.override_nrxds="4096"
dev.em.2.iflib.override_ntxds="4096"
dev.em.2.iflib.override_qs_enable="1"
dev.em.2.iflib.override_nrxqs="4"
dev.em.2.iflib.override_ntxqs="4"
dev.em.1.iflib.override_nrxds="4096"
dev.em.1.iflib.override_ntxds="4096"
dev.em.1.iflib.override_qs_enable="1"
dev.em.1.iflib.override_nrxqs="4"
dev.em.1.iflib.override_ntxqs="4"
dev.em.0.iflib.override_nrxds="4096"
dev.em.0.iflib.override_ntxds="4096"
dev.em.0.iflib.override_qs_enable="1"
dev.em.0.iflib.override_nrxqs="4"
dev.em.0.iflib.override_ntxqs="4"
dev.em.0.fc="0"
dev.em.1.fc="0"
dev.em.2.fc="0"
dev.em.3.fc="0"


And I can now achieve 940Mbps raw throughput.

Also I can now happily run the shaper, set to 900Mbps with FQ Codel, which (only) brings it down to around 890Mbps on the WaveForm BufferBloat Speedtest, A+ grade results as well.  Looks like the J1900 gets to live a little longer :)

I'll let it run for a few days, just to make sure there are no issues with the various UDP tunnels etc I have setup, then try with HT (as I see there was a request for data in the earlier posts).

Quote from: franco on November 14, 2021, 09:13:29 AM
Unbound so-reuseport is buggy with RSS enabled so we removed it to avoid further problems. It might jus be that the outcome is the same speed wise either with so-reuseport disabled or RSS enabled mutually exclusive.

We will be looking into it, but with the beta just out it's better to concentrate on more urgent issues.


Cheers,
Franco
I'm planning on testing RSS during my Xmas break from work. I use Unbound and this message makes me unclear if testing should be done if I'm using Unbound.
The previous message regarding a commit to make so_reuseport conditional on the sysctl value made me think it was OK. Then this later message makes me ask.
What's the current status of RSS + Unbound.
If I tried and tested it, what is the expected behaviour ?
I am on 21.7.5, OpenSSL, Harware APU4. IDS on LAN.
Thanks.

I'm using Unbound and RSS at home and I don't notice any difference. The situation needs some sort of fix in the kernel, but for day to day use it's good enough.


Cheers,
Franco

Quote from: franco on December 15, 2021, 09:07:59 AM
I'm using Unbound and RSS at home and I don't notice any difference. The situation needs some sort of fix in the kernel, but for day to day use it's good enough.

Cheers,
Franco
Perfect, thanks Franco. I might just take the jump today.

I applied the changes to enable RSS yesterday, rebooted and so far no adverse effects noticed, the only exception being that Unbound seems to utilise only one thread.

penguin@OPNsense:~ % sudo sysctl -a | grep -i 'isr.bindthreads\|isr.maxthreads\|inet.rss.enabled\|inet.rss.
bits'
net.inet.rss.enabled: 1
net.inet.rss.bits: 2
net.isr.bindthreads: 1
net.isr.maxthreads: 4
penguin@OPNsense:~ % sudo netstat -Q
Configuration:
Setting                        Current        Limit
Thread count                         4            4
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs          enabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1   1000    cpu   hybrid   C--
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256    cpu   direct   C--
ip6        6    256    cpu   hybrid   C--
ip_direct     9    256    cpu   hybrid   C--
ip6_direct    10    256    cpu   hybrid   C--


OPN 21.7.5, OpenSSL, Hardware is APU4. IDS on LAN.
Upgraded BIOS beforehand, coreboot v 4.14.0.6 .
Network interfaces on this system are igb.

Thanks for this development.

Hi,
My goal is allowing PPPoE to be shared across multiple CPU's, something like PPPoE per-queue load distribution. This is to allow me to get greater than gigabit speed through my OPNsense router when using PPPoE.

I use proxmox to virtualise my OPNsense instance for many reasons, so additional to the above, I would like to see the improvements be applied to a virtual instance.

My question is, can RSS work with the vtnet virtio driver?

From my research, this was being worked on through making use of eBPF
https://qemu.readthedocs.io/en/latest/devel/ebpf_rss.html

This requires eBPF being implemented into FreeBSD (And of course in OPNsense)
https://ebpf.io/what-is-ebpf
https://wiki.freebsd.org/SummerOfCode2020Projects/eBPFXDPHooks

Can anyone help with if this has been implemented and if its possible to test this?
I can test but I'm not sure where to start.

Thanks for any advice on this

Updated - Perhaps it was never finally completed and implemented?
https://www.freebsd.org/status/report-2020-07-2020-09.html#Google-Summer-of-Code%E2%80%9920-Project---eBPF-XDP-Hooks

Thats a pitty as it sounds like it would greatly benefit the use of FreeBSD/OPNsense when needing a tunnel type connection like PPPoE. 







Would rss be included in the 22.1 RC kernel?

It appears so!


Does this work with a VMware VMX NIC Interface VMXNet3?

Running OPNsense in a VM on ESXi 7.0 U3