[SOLVED] Suricata causes out-of-memory error

Started by adn77, July 02, 2022, 10:33:17 PM

Previous topic - Next topic
July 02, 2022, 10:33:17 PM Last Edit: July 15, 2022, 08:52:38 AM by adn77
We're on 22.4.1 now and tested 22.1.x before.
At approx. 22.1.4 a strange behavior was introduced; our rock solid OPNsense started to hang, some traffic might pass but new VPN connections wouldn't, neither the web ui.

Unsure whether this might be a general memory leak we continued to monitor as to what the cause of the excessive memory consumption might be.
Turned out the no.1 memory eater was Suricata. If that's disabled everything is fine.

We have 16GB of RAM, about 2.7GB are generally used when Suricata doesn't run - about 3.7GB when it does.
There is a remote log to Fluentd/Opensearch setup via syslog. There are a few hundred events per day.
We run the ET Telemetry rules - most of them are enabled in IDS mode.

There is a Suricata restart Cron (I don't remember having set this up) at 01:2x in the morning.
About 20 hours later (at about 21:00) Monit starts sending Memory Limit exceeded warnings. Soon thereafter the box is not responding to anything anymore. There is no Cron entry at oraround that time.

Has anybody experienced something similar?
What else can we do to further debug the issue?

Seemes that your network just has enough traffic to cause logs to fill up RAM quickly.

I would check how long it works on average and schedule reboot based on that.

You also might want to checck settings and see, how much RAM is allowed to be signed for IDS and if it keeps them too long. Other than that, adding more RAM or sending logs to dedicated server is all I can think of.

If you have automatic rules updates newer rulesets could cause this as well. Sometimes even faulty rules are introduced that eat too much memory.


Cheers,
Franco

We are already sending logs to a remote syslog (Fluentd/Opensearch).
Besides there were only 20+ messages that day.

The rule-update happens 16 hours before the system becomes unresponsive. We only use ET pro telemetry - somebody else should be faced with similar problems if the rules were broken.

From the Changelog in 22.1.9 I saw some memory leak in BSD13 being fixed. Will there be a new release for the business edition as well?

Thanks,
Alex

Hi Alex,

22.1.10 and 22.4.2 (based on 22.1.9) are both scheduled for tomorrow.

As for memory leaks it does depend on the network traffic mix and the selected rules as well, but in general you are right that there haven't been any other reports of it.


Cheers,
Franco

I upgraded right away, thanks for the new release.

Turns out the culprit is a backup (Urbackup) which pulls approx. 60GB across the firewall (router between management and company network).
In some cases Suricata recovers bevore reaching 100% memory usage and then gets back to normal. So it's not a memory leak.

Question is why does Suricata build up such a large buffer?

If it would try to scan the connection and make sense of it (decoding) it could always happen. But that's for Suricata devs to assess.


Cheers,
Franco

I believe, I went to the bottom of the issue.

(We are routing SMB over OPNsense which might not be a common use case.)

By default Suricata watches an SMB stream from beginning to end. In the case of long-lived SMB connections this might be a period of many days.
We assessed that our current signatures are more likely to trigger on connection setup. Therefore we limited the stream depth to 32mb and Suricata runs stable now. (https://forum.suricata.io/t/suricata-memory-allocation/573)

    smb:
      enabled: yes
      detection-ports:
        dp: 139, 445

      # Stream reassembly size for SMB streams. By default track it completely.
      # limited to avoid memory exhaustion
      stream-depth: 32mb


Any chance for this to be incorporated into Suricata's settings?

It's a bit too specific but to make it persistent in your install you can add it to /usr/local/opnsense/service/templates/OPNsense/IDS/custom.yaml


Cheers,
Franco