OPNsense Forum

English Forums => Hardware and Performance => Topic started by: phocean on March 24, 2023, 05:34:50 PM

Title: Simple L3 traffic flood leads to CPU denial of service
Post by: phocean on March 24, 2023, 05:34:50 PM
Hello,

I installed OPNSense on an APU6 with AMD GX-412TC SOC CPU.

It takes a single cloud instance and hping to DDOS the appliance completely. With no TCP port open, just targeting the IP :

hping3 --rand-source -S --flood -p <any port> <ip address>

As a result, the traffic would climb to 30 Mbps (nothing huge) and the CPU would jump and stay at 100% as long as hping3 keeps sending traffic.

I enabled all sort of hardware optimisation that seemed to be available, as well as SYN cookies.

But, as the DDOS works against closed ports, the issue seems to be with L3, and how the kernel handles massive IP requests from random sources (DDOS).

Apparently, there seem to be absolutely no hardware offloading of any sort, which is surprising. L3 traffic should not be compute intensive and traffic should just be quickly dropped.

Did I miss something in the configuration ? Or the APU just sucks for this usage ?

Thank you for any guidance.
Title: Re: Simple L3 traffic flood leads to CPU denial of service
Post by: meyergru on March 24, 2023, 06:49:28 PM
Difficult to say what you missed, and you did not specify what you did. For example: Did you use Firewall->Settings->Advanced->Enable syncookies?

There was no impact when I tried that from external to my box, but I am unsure if that traffic is filtered somewhere on the way.

P.S.: Attention viewers: Do NOT try this from home:
When I did this from my LAN to the OpnSense interface without syncookie protection, my boxes hung up completely with no automatic reboot.

On my DEC 750, I saw this endless loop:

Tracing command kernel pid 0 tid 100033 td 0xfffffe001e7b6ac0
sched_switch() at sched_switch+0x6f9/frame 0xfffffe001b7bae20
mi_switch() at mi_switch+0xc2/frame 0xfffffe001b7bae40
_sleep() at _sleep+0x1fc/frame 0xfffffe001b7baec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xb1/frame 0xfffffe001b7baef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe001b7baf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001b7baf30
--- trap 0x82c16fe0, rip = 0xffffffff80c30e8f, rsp = 0, rbp = 0x3000000020 ---
mi_startup() at mi_startup+0xdf/frame 0x3000000020

Tracing command kernel pid 0 tid 100034 td 0xfffffe001e7b63a0
sched_switch() at sched_switch+0x6f9/frame 0xfffffe001b7b5e20
mi_switch() at mi_switch+0xc2/frame 0xfffffe001b7b5e40
_sleep() at _sleep+0x1fc/frame 0xfffffe001b7b5ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xb1/frame 0xfffffe001b7b5ef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe001b7b5f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001b7b5f30
--- trap 0x82c16fe0, rip = 0xffffffff80c30e8f, rsp = 0, rbp = 0xffffffff82c16fc0 ---
mi_startup() at mi_startup+0xdf/frame 0xffffffff82c16fc0
kernload() at 0xb7b49003/frame 0xfffffe0000000100

Tracing command kernel pid 0 tid 100036 td 0xfffffe001e7b5560
sched_switch() at sched_switch+0x6f9/frame 0xfffffe001b7abe20
mi_switch() at mi_switch+0xc2/frame 0xfffffe001b7abe40
_sleep() at _sleep+0x1fc/frame 0xfffffe001b7abec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xb1/frame 0xfffffe001b7abef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe001b7abf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001b7abf30
--- trap 0x82c16fe0, rip = 0xffffffff80c30e8f, rsp = 0, rbp = 0xffffffff82c16fc0 ---
mi_startup() at mi_startup+0xdf/frame 0xffffffff82c16fc0
kernload() at 0xb7b49003/frame 0xfffffe0000000100
Title: Re: Simple L3 traffic flood leads to CPU denial of service
Post by: phocean on March 25, 2023, 10:22:24 AM
Hi,

Yes, SYN cookie has been enabled and it's not related anyway because it's a DDOS.

The attack consist in just a single SYN from many different IPs, the attacker does not care about the reply, so it will never proceed with the TCP conversation ;
Moreover, the attack works against closed ports.

You can tell if the attack reaches your gateway, if you have a traffic peak. Mine, but it will depend on the attacking server, goes up from 1 Mbps at most to a constant 30 Mbps.

If you observe no traffic peak, it means that the DDOS has been block upstream.

If you observe a correlation between the traffic peak and the CPU usage going up to a constant 100%, it means your appliance is vulnerable.

If your CPU is not, or not much impacted, your appliance is not vulnerable.

I get that DDOS can be hard to block, if not impossible when it is based on a higher bandwidth than the one your get.
But in my case, I have 1 Gbps available and the firewall collapses with 30 Mbps, when usually it handles much more bandwitdth with no issue.

It means there is really something wrong in the hardware or software filtering.

That's why I see only three explanations :

1) A misconfiguration on my side, which puts too much load on the CPU ;

2) Weak hardware, either on the CPU or the network card side, which are unable to sustain a high number of connections ;

3) Weak software, by design, where OPNSense is unable to accelerate low level connection and put all the load on the CPU.

It might be a combination of the three...

That's why, even if you don't have a definitive answer, I am interested by your feedback.

If you have a similar configuration and are unable to reproduce the issue, then I have to keep working on my configuration.

Even if you have a different hardware, it's interesting as I might consider replacing my appliance.

If you have the issue, it would confirm that there is something really wrong.

Thanks !
Title: Re: Simple L3 traffic flood leads to CPU denial of service
Post by: meyergru on March 25, 2023, 12:02:50 PM
Syncookies ARE related because you request a TCP SYN attack via -S. This essentially simulates a DOS attack, not from different IPs, but that does not matter.

The difference with the setting is if there will be further processing of these packets or not - you can see this in the rising number of state table entries if syncookie protection is disabled. At least I can see relevant CPU load and rising number of state table entries only if I use the default setting of "disable".

This is to be expected if a large number of packets must be processed. In this case, each packet has a length of 64 bytes, such that even with onlöy 30 Mbit/s, there will be 60000 packets per second in your case. I tested it over a 10 Gbps local connection, this gives you up to 19 million packets per second.

So, essentially, high CPU load and state table exhaustion is exactly what syncookie protection should prevent. I do not know if your hardware is so weak that it suffers from high load even if syncookie protection is on.



However: What I did not expect is a kernel fault and even with that, it should give a reboot, which it does not - and I call that a catastrophic failure, which is way worse than what you intially asked for.

I have original Deciso hardware, so this should not happen. I have checked againt a VM under proxmox, that one does not show this behaviour. Thus, I am unsure if this is a driver issue for the ax driver or related to other recent reports of kernel faults. I will fire up another hardware-based instance with different NICs to check if this happens only on ax hardware.
Title: Re: Simple L3 traffic flood leads to CPU denial of service
Post by: phocean on March 25, 2023, 01:07:22 PM
I am going to give more thought on your comment, but so far I disagree.

To me SYN cookies are more designed to prevent the SYN queue to be filled up, not necessarily the CPU load.
Without SYN cookies, you can do a simple DOS : it means, you can fill up the SYN queue with a single IP address.
It helps the firewall or the server in not maintaining states, and it forces the attacker to respond, using more resources on his side and preventing him from spoofing IP addresses (because he needs to keep track of the connection).

It would not prevent attacks coming from an attacker using a virtually infinite number of IP addresses, when he can spoof them and does not need to care about any reply. He just sends SYN one after another, so SYN cookies are quite irrelevant.

Precisely, hping does not keep track of the connection, it just sends SYN.

Also, I can confirm that SYN cookies are enabled, because when I target an open port on my appliance with hping but without the random source option, the connection gets immediately terminated with "operation not permitted".
Because hping is unable to deal with the cookie.
Without SYN cookies, it proceeds with sending repeated SYN.

Title: Re: Simple L3 traffic flood leads to CPU denial of service
Post by: phocean on March 25, 2023, 01:16:10 PM
I am pretty convinced that it's an hardware issue, but as each appliance cost at least 500$, I would like to be sure to get one that works properly before purchasing anything...

I am afraid to spend another 1000$ to get another OPNSense box with more CPU power and get the same issue...
Title: Re: Simple L3 traffic flood leads to CPU denial of service
Post by: meyergru on March 25, 2023, 02:45:02 PM
I can only say:

1. I think that any remote ISP I can try to send to my OpnSense box prevents source spoofing, so that I cannot really test this remotely.

2. When I test this locally from a Linux client on my LAN connected over 10 Gbps, I can see a clear difference between syncookie protection on and off. With syncookie protection on, state table does not fill up and CPU load rises to about 50%. With syncookie protection off, state table usage ramps up as does CPU load (100%). I have seen kernel traps in this situation - currently, I cannot reproduce it, probably because I limit the target to the local LAN interface. Probably, this would change if I targeted some outside address, but I fear to down my outside line again.
Title: Re: Simple L3 traffic flood leads to CPU denial of service
Post by: phocean on March 25, 2023, 07:49:23 PM
I could find the main culprit and greatly improve the performance :

1. Syslog was the process taking much CPU : I was not aware of a system parameter that enabled logging for default rules, NAT, etc. I was not aware of that because I did not enable logging in my rules.

2. This performance setting was also a game changer : https://docs.opnsense.org/troubleshooting/performance.html#receive-side-scaling

CPU charge is much better controlled now.

I am still experienced some dropped session, so still have some improvements to do, but it's much, much better !
Title: Re: Simple L3 traffic flood leads to CPU denial of service
Post by: mimugmail on March 26, 2023, 08:34:20 AM
Dont forget that this device is quite old, but yes, disable losging would save CPU here.