I have been interested in intrusion prevention and Suricata for several years. I find suricata fascinating.
However, it is hard to find information on setups with 10Gbps IPS. I would be very interested if one of you is running such a setup on OPNsense and which HW is used.
Currently the stream of an interface runs through a CPU core. So the throughput is very limited.
Alternatively to the HW question I would be interested when and if Suricata (IPS mode) on OPNsense uses all CPU cores.
The thing is, I find it a bit unfortunate that "commercial providers" offload IPS.
On the HW systems then e.g. a FPGA or ASIC is installed which then executes IPS with wirespeed. Of course, OPNsense cannot implement this solution, but I just wanted to mention this for the context.
Don´t compare apples with pears.
The Asic´s or FPGA´s looks good on paper, but not in real scenarios.
Most of these "commercial providers" using a similar IPS-Engine like Snort or Suricata ;)
For example, for an "High-Performance-Setup" we using OPNSense in an virtualized HA-Stack (Proxmox).
Search for CPU´s with high clock rate.
Some "standard"-Blades with modern Xeon´s or AMD Epyc should be enough for Suricata ;)
Example above, 2 OPNSense in HA with Suricata (with a lot! of rules), average 20 TB mixed traffic per day, the CPU idles around 2-4%.
Edit: Suricata uses all cores!
Edit2: One OPNSense HA-Setup with 2* HP DL380 G10+ (Xeon Gold 5218), another with 2 Supermicros with AMD Epyc 7443p.
QuoteDon´t compare apples with pears.
The Asic´s or FPGA´s looks good on paper, but not in real scenarios.
I mentioned this for context.
Fortigate uses something like this, for example. Alternatively, there are the NICs from napatech.
Please explain which "real" scenarios you refer to.
QuoteSearch for CPU´s with high clock rate.
I know. Since Suricata distributes the individual interfaces to individual CPUs in IPS mode (runmode: workers), the single core performance is so important. A 10G inteface would therefore need a CPU that can do this in single core. Alternatively 2x 10G as lagg and then distribute the traffic over it. Then the Suricata distributes the traffic of the individual lagg members on cores = number of lagg members.
Quote
Example above, 2 OPNSense in HA with Suricata (with a lot! of rules), average 20 TB mixed traffic per day, the CPU idles around 2-4%.
Does this refer to the IPS mode? Which NIC? Lagg? how many interfaces?
"Real scenario" -> Master-Gateway to protect Datacenters (Cloud/Applicationprovider).
Before we using Fortigates with catastrophal experience (buggy Firmware, slow IPS, alot Hardware defects)
Single core performance is always important ;)
Maybe our situation is not the same as yours because we virtualized OPNSense with Proxmox, so these things are hardware-independend for us (virtio-nics with multiqueue).
Yes, IPS-Mode. Virtio-NIC. No LAGG. Underlying Hardware-NICs: Mellanox 100GB Dualport or Intel X7** Dualport.
Um, Yeah,... I don't buy one bit of "Fortinet sucks" junk...
Every appliance including OPNsense have bugs up the yingyang and need to be fixed.
Now, Fortinet does use their processing of various functions on ASIC as opposed to OPNsense which uses CPU. It is one the reasons why Fortinet can push terabits and it's why they are ISP class firewalls including their itty-bitty boxes. :)
When I read Virtualization, then all this high performances stuff is out the door.
One thing is fact, hardware always performs better than any software.
Here's the Matrix for your reference...
https://www.fortinet.com/content/dam/fortinet/assets/data-sheets/Fortinet_Product_Matrix.pdf
A real guide with which hardware which speed is achievable does not really exist (apart from DECISOs own HW).
I suspect that iflib together with the igb driver and suricata is slowing things down somewhere. This would explain why guenti_r achieves good performance with the HW:
QuoteUnderlying hardware NICs: Mellanox 100GB Dualport or Intel X7** Dualport
.
It would be ideal, and I hope so, if Deciso would produce hardware with acceleration e.g. via FPGA. Then interface speed would be equal to IPS speed.
Until then I can only buy the fastest CPU (single thread) possible to get some speed.
As per the understanding, all the performance is based on hardware used. I do not believe Decisio is going to test everything and only test their own and provide the data like they currently are. If you look at the Fortinet Matrix, you can also see that all those are really dependent on how much services are running on the system.
I did not write that DECISO should test all hardware, but only that DECISO logically knows only the test results from its own hardware ;-)
What would be nice is if someone with 10Gbps IPS comes forward and shares their experiences or hardware selection to provide others with an orientation in the hardware selection.
In other words, bare metal.
Which CPU, RAM (size and speed), motherboard, NIC, storage, number of Suricata rules, network connectivity, benchmark results......
It really depends on the type of traffic, when it comes to IPS. if you look at the Fortinet, there's a * which mentions Enterprise, but not sure what that is... LOL
no shit sherlock.
tools like cisco trex exist to measure those things.
Why not test it in a virtual appliance?
Use an "old" server, put OPNSense in a VM and measure.
I have done this several times.
Fortigate´s look good on paper, but are horrible in real world.
Here we have around 60 Forti´s for the trash can....
Every Forti is replaced with OPNSense Appliances, from DEC690 up to big HA-Clusters and VMs.
These are daily experiences. We have a lot of Customers where i had replaced these horrible Forti-Boxes.
IPS is a little beast, Suricata is fast enough to compete with these "Commercial" boxes.
Edit: Answer the question, on an HA-OPNSense-Cluster, 222450 rules are enabled.
As I had already written at the beginning:
"I have been interested in intrusion prevention and Suricata for several years"
I have already been using Suricata for 4 years.
I have used OPNsense on various hardware appliances and have also run benchmarks several times. In the most diverse configurations. And achieved results from 500Mbit to 2Gbit. On different hardware.
All this is known to me. I only mentioned fortigate because they do IPS in hardware. I hope that is understandable. It was never about the direct comparison of OPNsense to Fortigate!
It is simply about the question of what hardware is required for 10Gbps IPS. Nothing else! I hope that is now understandable for all.
Quote from: lilsense on December 14, 2022, 05:28:09 PM
It really depends on the type of traffic, when it comes to IPS. if you look at the Fortinet, there's a * which mentions Enterprise, but not sure what that is... LOL
Full agree. Try for fun on a Fortigate enabling more than 10000 IPS Rules and it crashes instantly.
Good Marketing, bad (stolen GPL-Code) Product.
Only the highest priced Forti´s have enough power to play reasonable with IPS in real world scenario.
These Asic´s are slower than some Smartphones out there :)
Take a look on de hardware specs, it will bring you a smile in your face:
https://yurisk.info/2021/03/14/Fortigate-Firewalls-Hardware-CPU-model-and-number-Memory-size-datasheet-table/ (https://yurisk.info/2021/03/14/Fortigate-Firewalls-Hardware-CPU-model-and-number-Memory-size-datasheet-table/)
Quote from: seed on December 14, 2022, 07:24:26 PM
It is simply about the question of what hardware is required for 10Gbps IPS. Nothing else! I hope that is now understandable for all.
Did you read the answers? Re-read my first one.
Quote from: seed on December 14, 2022, 04:59:28 PM
Until then I can only buy the fastest CPU (single thread) possible to get some speed.
Suricata is multi-threaded, take a look:
https://suricata.readthedocs.io/en/suricata-5.0.3/configuration/suricata-yaml.html#threading (https://suricata.readthedocs.io/en/suricata-5.0.3/configuration/suricata-yaml.html#threading)
I have a mix of 10GBs, 25Gbs, and 40Gbs NICs. I use Intel X710-DA2 for the LAN interface in the OPNsense firewall. Servers have XVV710, X710, and Chelsio T580. All work fine with IDS.
Never heard of IPS with more than 4G throughput
Its not that hard...
We run that everyday and has for 3+ yrs. :)
It just takes serious hardware.
Quote from: Supermule on December 18, 2022, 03:29:22 PM
Its not that hard...
We run that everyday and has for 3+ yrs. :)
It just takes serious hardware.
Which specs? Screenshot or it didnt happen ;)
Same thing here (guenti_r):
QuoteFor example, for an "High-Performance-Setup" we using OPNSense in an virtualized HA-Stack (Proxmox).
Search for CPU´s with high clock rate.
Some "standard"-Blades with modern Xeon´s or AMD Epyc should be enough for Suricata ;)
Example above, 2 OPNSense in HA with Suricata (with a lot! of rules), average 20 TB mixed traffic per day, the CPU idles around 2-4%.
Screenshot or it didnt happen. Show some benchmark results with Suricata in IPS mode with 10Gbps throughput instead of talking around it.
Really annoying that I cant post snips here.... with CTRL+V.
Makes it alot easier.
16CORE XEON 3.00 gHz running "the other sense".
Quote from: seed on December 18, 2022, 09:27:10 PM
Same thing here (guenti_r):
QuoteFor example, for an "High-Performance-Setup" we using OPNSense in an virtualized HA-Stack (Proxmox).
Search for CPU´s with high clock rate.
Some "standard"-Blades with modern Xeon´s or AMD Epyc should be enough for Suricata ;)
Example above, 2 OPNSense in HA with Suricata (with a lot! of rules), average 20 TB mixed traffic per day, the CPU idles around 2-4%.
Screenshot or it didnt happen. Show some benchmark results with Suricata in IPS mode with 10Gbps throughput instead of talking around it.
Quote from: Supermule on December 18, 2022, 09:38:41 PM
Really annoying that I cant post snips here.... with CTRL+V.
Makes it alot easier.
16CORE XEON 3.00 gHz running "the other sense".
Quote from: seed on December 18, 2022, 09:27:10 PM
Same thing here (guenti_r):
QuoteFor example, for an "High-Performance-Setup" we using OPNSense in an virtualized HA-Stack (Proxmox).
Search for CPU´s with high clock rate.
Some "standard"-Blades with modern Xeon´s or AMD Epyc should be enough for Suricata ;)
Example above, 2 OPNSense in HA with Suricata (with a lot! of rules), average 20 TB mixed traffic per day, the CPU idles around 2-4%.
Screenshot or it didnt happen. Show some benchmark results with Suricata in IPS mode with 10Gbps throughput instead of talking around it.
Hm, in my Tests I had a more powerful machine. Sure its IPS or not IDS?
Quote from: mimugmail on December 18, 2022, 09:45:07 PM
Hm, in my Tests I had a more powerful machine. Sure its IPS or not IDS?
Baremetal or virtualized?
Edit: IPS
Quote from: seed on December 18, 2022, 09:27:10 PM
Screenshot or it didnt happen. Show some benchmark results with Suricata in IPS mode with 10Gbps throughput instead of talking around it.
Not a very nice language :(
Maybe this helps:
https://suricata.readthedocs.io/en/latest/performance/high-performance-config.html (https://suricata.readthedocs.io/en/latest/performance/high-performance-config.html)
Instead of simply showing benchmarks to prove that your setup can handle 10Gbps of throughput with Suricata, you're avoiding the questions. I wasn't sure at first, but now I am, that you are just a troll. Prove your statements or don't participate in this discussion.
As it seems, no one on the forum can verifiably report running a setup with 10Gbps IPS throughput.
10gbps IPS is probably still left to FPGA systems.
Hopefully there will be OPNsense hardware with IPS accelerators available for purchase in the future. That would be cool. Would solve some scaling problems. Until then, I guess it will remain the boring IDS operation in the datacenter area.
The usual problem with a generic x86 OS and open source :)
With all the CVEs of the commercial providers, I prefer to stay with open source. The last years in which I have used OPNsense I have found mostly positive and therefore see no reason to use another firewall.
I had thought of accelerator cards from napatech. I have not tested them yet. Napatech advertises them with lossless wirespeed e.g. NT100A01 SmartNIC.
However, there is not a single test report on the cards. The Internet is generally quite empty on such Smartnics that can accelerate suricata. As a private person, you can't get the cards at all. So not a suitable toy for consumers who run a homelab or have part of their own hardware in a colo.
I will supply screenshots of 10Gbs NICs throughput with IPS if you tell me what to use to generate the info you want.
Please test with iperf3 so that we get an approximate impression of the performance.
TCP and UDP.
The traffic must be routed through the OPNsense.
Please tell us which interface you are routing through (physical/VLAN), if you are using NAT and on which interface Suricata is running. Also the number of loaded rules.
Also what hardware: CPU, RAM (size and speed), motherboard, Nics.... would help us.
Beside the Iperf screenshots please take a screenshot during the test of top: "top -aSHIP" so we can see the CPU load during the test. Please also screeshots of your Suricata settings.
I am very curious to see the results.
Had a few minutes to get this together.
Here is a sample iperf3 going from the firewall to Windows PC on the physical LAN. 192.168.100.2 is the PC and 192.168.100.1 is the firewall. Let me know if I should be using other endpoints.
LAN is using x710-DA4 2-10GB ports setup as lacp lagg0. WAN is Intel i210.
CPU is i5-7600, RAM 16GB
So you have Suricata running on your gigabit interface. But you claim that you reach 10 gigabit throughput. Your screenshot even proves that your statement is not correct (2,8Gb). Also, the requirement was to route the traffic through the OPNsense. Sorry but you missed the point.
Also ... you are using IPerf from LAN interface to a LAN host while Suricata only runs on WAN. :)
First off, I never said I achieved 10GB speeds. I just stated that it works. If I had better instructions on what you wanted to see maybe you would have what you wanted. My goal was to start a conversation about how to improve IDS performance not a condemnation. I just wasted my time with this thread. Thanks.
Here are some comparisons, using IDS on LAN only and 10GB NIC's on both LANs
Even without IDS, I can only achieve around 6 Gb/s, so IDS doesn't slow it down too much.
IDS is using 4 rulesets. Same computer specs and NIC's on both sides.
Remember that your SATA bus doesnt push more than 6gbit/s no matter what.
So many of the systems sold cannot push more than that.
SAS pushes 12gbit/s and Nvme is limitless. (more depending on NIC's and CPU).
Using NVMe not SATA on both systems
Quote from: Supermule on December 31, 2022, 10:59:11 AM
Remember that your SATA bus doesnt push more than 6gbit/s no matter what.
So many of the systems sold cannot push more than that.
SAS pushes 12gbit/s and Nvme is limitless. (more depending on NIC's and CPU).
This thread is getting spammed by people who completely miss the topic.
Can the moderators close this topic?
It may take some cpu generations until 10gbps IPS are in reach. Until then this discussion goes nowhere.
to answer your own question, get a threadripper with 10Gig card and see if you can make it sweat. :D
Quote from: seed on December 31, 2022, 11:43:47 PM
Quote from: Supermule on December 31, 2022, 10:59:11 AM
Remember that your SATA bus doesnt push more than 6gbit/s no matter what.
So many of the systems sold cannot push more than that.
SAS pushes 12gbit/s and Nvme is limitless. (more depending on NIC's and CPU).
This thread is getting spammed by people who completely miss the topic.
Can the moderators close this topic?
It may take some cpu generations until 10gbps IPS are in reach. Until then this discussion goes nowhere.
So because you dont agree or dont like, then you ask for a closure....
It can easily be done. Servergrade hardware (Dual Xeon's) and I710-T4 nics. This is what we use. It just keeps tugging along at about 1,4MM PPS hardly breaking a sweat.
What does disk bandwidth - though factually correct - have to do with IPS performance?
Quote from: pmhausen on January 01, 2023, 01:27:58 PM
What does disk bandwidth - though factually correct - have to do with IPS performance?
Primarily log writing to disk.... we used this as a guide.
https://redpiranha.net/news/High-speed-IDP/S-suricata-hardware-tuning-for-60gpbs-throughput
https://www.google.dk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjP_YvxsKb8AhUdRvEDHRqBCVIQFnoECD8QAQ&url=https%3A%2F%2Fuia.brage.unit.no%2Fuia-xmlui%2Fbitstream%2Fhandle%2F11250%2F2823637%2FF%25C3%25B8rde%2520Roar%2520%2528705%2529_78839715_2.pdf%3Fsequence%3D1&usg=AOvVaw2YPOejlIrJWikYDIc32L6E
But with 10 Gbps network to scan as the OP asked, and 9X% of all traffic being irrelevant - do you really think SATA could ever become a bottleneck?
You don't log unsuspicious/permitted connections, do you?
Quote from: pmhausen on January 01, 2023, 02:03:13 PM
But with 10 Gbps network to scan as the OP asked, and 9X% of all traffic being irrelevant - do you really think SATA could ever become a bottleneck?
You don't log unsuspicious/permitted connections, do you?
It becomes a bottleneck when Suricata writes to the logs no matter the ruleset/traffic.
In "the other sense" as soon as it sees above the 200.000 PPS mark it becomes sluggish because of the disk subsystem and the logging...
I've looked into this a lot...and admittedly, it's hard to find up-to-date and reliable information. From everything I have investigated, it is even more challenging to get close to 10Gbps IPS using Suricata on FreeBSD because of Netmap.
Although Sucircata can utilize more than one CPU core, Netmap's implementation on FreeBSD has historically been limited to a single CPU core when using Suricata in IPS mode. Apparently, there is work underway to change this behavior, but I haven't been able to find the current state of progress.
This was previously brought up by a forum admin in the post I've quoted below. It has been almost two years since the post though...so I'm on the hunt for any updates on this.
Quote from: tuto2 on July 27, 2021, 11:09:23 AM
Hi,
Suricata on FreeBSD uses Netmap to achieve IPS functionality. Judging by your logs, you are indeed using netmap to bypass the host stack and enable Suricata to inspect packets straight off the wire.
Note the way ports are opened:
ix0/R (Receive thread) --> ix0^ (Host stack)
ix0^ (Host stack) --> ix0/T (Transmit thread)
This simply means that on initialization, netmap opens two "ports" - one on which to capture packets, at which point Suricata will be able to do it's thing, and another port that represents the host stack (using the '^' symbol), which is used by Suricata to forward inspected packets back to the host stack. The same principle applies on the transmit side (but reversed) - totalling a thread usage of 4 in a default setup.
The way Netmap is currently implemented does not allow for more than one thread to connect to the host stack on both the receive and transmit side. Manually increasing the amount of threads will not ensure a gain in throughput, and any measured increase in throughput will be wrong, since packets on different threads might not even reach Suricata and thus could potentially even skip by Suricata, due to a lack of synchronization.
In conclusion, Suricata on FreeBSD currently only supports one thread in IPS mode. However, Netmap has recently committed support for multiple threads towards the host stack in FreeBSD, and Suricata is in the process of integrating this into their software - so keep an eye on that.
Cheers,
Stephan