OPNsense Forum
English Forums => Zenarmor (Sensei) => Topic started by: athurdent on April 01, 2021, 07:25:31 pm
-
The new Deciso fanless appliances look very tempting. It would be veeeery cool to have Sensei throughput tested and listed for those. :)
-
Yes :) If anyone can share the ubench -cs output I can provide a throughput estimate.
-
Thanks, @mb!
I found someone very helpful who was kind enough to provide the data for a DEC3850:
Ubench Single CPU: 544725 (0.40s)
-
Always a pleasure.
This score looks great. I *guess* this can easily do over 1 Gbps (may be 1.5-2bps), provided that ethernet driver does not have problems with netmap.
Reading the product specs from here https://shop.opnsense.com/wp-content/uploads/2021/02/BROCHURE-DEC840_50.pdf, I see that IPS throughput is around 2 Gbps. IPS and Sensei use the same packet interface so I'd guess that with 16GB RAM you can easily serve a school campus with around 1000 devices and 1+ Gbps WAN.
-
Always a pleasure.
This score looks great. I *guess* this can easily do over 1 Gbps (may be 1.5-2bps), provided that ethernet driver does not have problems with netmap.
Reading the product specs from here https://shop.opnsense.com/wp-content/uploads/2021/02/BROCHURE-DEC840_50.pdf, I see that IPS throughput is around 2 Gbps. IPS and Sensei use the same packet interface so I'd guess that with 16GB RAM you can easily serve a school campus with around 1000 devices and 1+ Gbps WAN.
Nice, thank you for your estimates!
Curious though, about the Sensei mid-term roadmap item "10 Gbps in Hosted Mode". I believe this means 10G throughput for systems with 10G interfaces (not sure about the "hosted mode" part)?
Will this require special CPUs, or be an optimised Sensei engine that goes easier on the CPU, or leverages mulitple cores for one single connection, or something completely different?
-
10 Gbps throughput support does not require specialized hardware.
If your 10GbE adapter supports RSS (Receive Side Scaling) - and almost all 10GbE ethernets already do - you're good to go. Just make sure that your adapter is playing well with netmap.
Having said that, since default RSS configurations do not provide "symmetric" hashing, that is something we need to talk about with the OPNsense team. It's a relatively straightforward development, but still means a bit of deviation from the upstream kernel source.
Hosted mode basically means L3 Routed mode where Sensei sits on the main firewall which is the default deployment mode :)
-
Thanks @mb, really appreciate the insight!
-
I do have a OPNsense DEC3860. Connected to a 1Gb symmetrical fiber for internet access and LAN side connected to a 10Gb core Switch. The FW has IDP enabled with all major and critical rules, Sensei with two Policies (one blocking quite everything for one subnet), a hand full VLANs and view dozens firewall rules. The device runs idle on around 8% CPU and 18% memory. Doing some load tests with speedtest CLI from a server beging directly connected to the core switch with 10Gb (so 10Gb line of sight from server to firewall) the results are always around 920Mb up and down. CPU spikes shortly up to around 40-50% and memory to maybe 30%. I‘m pretty sure with the same configuration the appliance could handle above mentioned 1.5 to 2Gb if the internet link would support it.
Adding more IDP rules/policies memory consumption will raise as well. So CPU and appropriate interfaces is only one success factor, enough memory is the second and from my point of view more important. I believe, compared to a standard PC the appliance also seams to have a much more performant backplane, which must be the reason that CPU and memory consumption looks so distressed :)
-
10 Gbps throughput support does not require specialized hardware.
If your 10GbE adapter supports RSS (Receive Side Scaling) - and almost all 10GbE ethernets already do - you're good to go. Just make sure that your adapter is playing well with netmap.
Having said that, since default RSS configurations do not provide "symmetric" hashing, that is something we need to talk about with the OPNsense team. It's a relatively straightforward development, but still means a bit of deviation from the upstream kernel source.
Hi @mb, there is an RSS kernel to test now, will Sensei already benefit from that? I can test it and report back here, if you like. Got 2 ixl (Intel(R) Ethernet Controller X710 for 10GbE SFP+) connected to a 10G switch and can measure throughput locally.
-
Hi @athurdent, we had an email exchange about this with the OPNsense team a while ago.
And it looks like the team is very close to getting RSS in the kernel :)
https://twitter.com/opnsense/status/1425402746602246150
After it hits one of the OPNsense releases, we'll go ahead and enable multi-core support for Sensei.
-
Hi @athurdent, we had an email exchange about this with the OPNsense team a while ago.
And it looks like the team is very close to getting RSS in the kernel :)
https://twitter.com/opnsense/status/1425402746602246150
After it hits one of the OPNsense releases, we'll go ahead and enable multi-core support for Sensei.
Hi @mb, that sound great! So if all works out, the "single core high performance CPU" limit should be lifted I guess and we can use multicore CPUs to gain more speed?
Here are my results, maybe you could take an educated look at those? I think it might be working OK but I am not sure... ;)
https://forum.opnsense.org/index.php?topic=24409.msg117836#msg117836
-
After it hits one of the OPNsense releases, we'll go ahead and enable multi-core support for Sensei.
Hi @mb, it's official now.
src: include RSS kernel support defaulting to off
Would be preeetty cool to see multi-core for Zenarmor. :)
RSS works fine here it seems, my VM has 4 cores assigned:
root@OPNsense:~ # netstat -Q
Configuration:
Setting Current Limit
Thread count 4 4
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs enabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1000 cpu hybrid C--
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 4 256 source default ---
ether 5 256 cpu direct C--
ip6 6 256 cpu hybrid C--
ip_direct 9 256 cpu hybrid C--
ip6_direct 10 256 cpu hybrid C--
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 360 0 671367 0 198087 869454
0 0 igmp 0 0 0 0 0 0 0
0 0 rtsock 0 0 0 0 0 0 0
0 0 arp 0 0 0 0 0 0 0
0 0 ether 0 0 818270 0 0 0 818270
0 0 ip6 0 2 0 175 0 344 519
0 0 ip_direct 0 0 0 0 0 0 0
0 0 ip6_direct 0 0 0 0 0 0 0
1 1 ip 0 188 0 1120895 0 23110 1144005
1 1 igmp 0 0 0 0 0 0 0
1 1 rtsock 0 0 0 0 0 0 0
1 1 arp 0 0 1670 0 0 0 1670
1 1 ether 0 0 1209891 0 0 0 1209891
1 1 ip6 0 2 0 763 0 359 1122
1 1 ip_direct 0 0 0 0 0 0 0
1 1 ip6_direct 0 0 0 0 0 0 0
2 2 ip 0 298 0 833862 0 21968 855830
2 2 igmp 0 0 0 0 0 0 0
2 2 rtsock 0 0 0 0 0 0 0
2 2 arp 0 0 6 0 0 0 6
2 2 ether 0 0 841523 0 0 0 841523
2 2 ip6 0 2 0 248 0 715 963
2 2 ip_direct 0 0 0 0 0 0 0
2 2 ip6_direct 0 0 0 0 0 0 0
3 3 ip 0 921 0 2282494 0 121993 2404487
3 3 igmp 0 0 0 0 0 0 0
3 3 rtsock 0 5 0 0 0 186 186
3 3 arp 0 0 537 0 0 0 537
3 3 ether 0 0 2359454 0 0 0 2359454
3 3 ip6 0 2 0 1348 0 418 1766
3 3 ip_direct 0 0 0 0 0 0 0
3 3 ip6_direct 0 0 0 0 0 0 0
-
@athurdent, thanks for the heads-up. I've just confirmed this with Franco. We'll be running our tests in the following week.
Note: team's agenda is quite filled with Shaping and TLS work, expect this to land in a production release later on. (We'll send you a test binary though ;)
-
@athurdent, thanks for the heads-up. I've just confirmed this with Franco. We'll be running our tests in the following week.
Note: team's agenda is quite filled with Shaping and TLS work, expect this to land in a production release later on. (We'll send you a test binary though ;)
Hi @mb, thank you for your swift reply! This sound awesome, especially the test binary part! :)
So, thank you very much in advance, really looking forward to beta-test throughput and stability!
-
Yes :) If anyone can share the ubench -cs output I can provide a throughput estimate.
here's mine on 850:
Ubench Single CPU: 507674 (0.40s)
-
Kudos to the OPNsense team. We've tested this with L2 Bridge mode and RSS work seems to be running great.
@athurdent, are you on L3 or L2 mode. If L2, we can send you a test binary right away. For L3 mode, stay tuned.
@lilsense, with that CPU score, you should be able to attain 1.5-2 Gbps per CPU core. Multiply that with the number of CPU cores. Scalability should be close to linear.
-
Kudos to the OPNsense team. We've tested this with L2 Bridge mode and RSS work seems to be running great.
@athurdent, are you on L3 or L2 mode. If L2, we can send you a test binary right away. For L3 mode, stay tuned.
@lilsense, with that CPU score, you should be able to attain 1.5-2 Gbps per CPU core. Multiply that with the number of CPU cores. Scalability should be close to linear.
@mb Very good news, awesome work both of you! :)
I‘lll have to stay tuned though, I‘m on L3. Really looking forward to testing this, looks like I could max out my 10G test setup with a score of 524263 (0.41s) and up to 8 possible cores for my VM. If threads also count/do any good, I could go up to 16.
-
This is great to hear. I'm looking at the new dec750 or a dec840 with warranty breaking ram upgrade possibly. It would mean I wouldn't need to spend the extra for the dec850 for my needs.