OPNsense Forum

Archive => 21.7 Legacy Series => Topic started by: sanxiago on August 11, 2021, 08:44:26 pm

Title: 21.7.1 Maltrail OOM / Possible fix
Post by: sanxiago on August 11, 2021, 08:44:26 pm
Thank you all for the work into this project, I wanted to share the following.

I was having constant issues with maltrail sensor on the last releases, it was starving my box and causing swap and hangs.
I have a 16GB box with 4 cores ( 4 sensors get started )

I saw another reddit post few days ago and someone mentioned the upgrade would fix this as we would get maltrail 0.35

However the upgrade did not fix it for me.

I did the following and it seems to be running stable for the past couple of hours:
1. Set a max memory value for sensor of 900 MB by default it is supposed to use up to 10% of free mem
2. I used git pull to update the maltrail version in place to 0.36 ( I needed only to update the lists but it seems there were code changes too so I pulled in all changes, and kept my config)

I think the 1. st change is likely what fixed it, but if you continue having issues you can try and clone 0.36 and keep your conf file.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: sanxiago on August 13, 2021, 08:25:44 am
Looks like memory leak is not fixed today again the sensors were using several gigs of ram each, I am going to try to reproduce the issue on different environment, if anyone else is seeing the men leak please chime in.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: ad1m on August 16, 2021, 12:36:25 pm
Hello!

I seem to experience the same issue with similar hardware(4 cores, 16GB RAM), 1 Gbit PPPoE WAN and latest version of everything from the official Community repo.
I don't know what triggers it, sometimes it works fine for a couple of days and then it starts eating up all the memory. I tried different values for Maltrail - Sensor - Capture Buffer Size, but it doesn't seem to do anything regarding the problem. Luckly, OPNsense still works even if Maltrail consumes all RAM and SWAP, so I can connect(webGUI or ssh) and kill the sensor processes. I think I will disable Maltrail and wait for an update, since for my setup it's more of a nice to have, than a must.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: opn_nwo on August 16, 2021, 05:13:13 pm
I did not check the mem usage, but I noticed that the sensor stopped reporting to the server sometimes. The workaround was to create a cron job to restart the service once a day. I had no problem since.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: RZR on August 16, 2021, 09:01:16 pm
Same problem for me, doesn't seem to matter what values I set. 4 cores, 8GB RAM - no performance problems until Mailtrail is enabled. It runs okay for a day or two, then suddenly starts consuming massive amounts of RAM and causing Suricata to crash.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: sanxiago on August 18, 2021, 09:07:09 pm
Thank you for chiming in, at least I know it is not my doing. I see the same pattern I think it may be traffic driven, I also see the same thing with Suricata eventually becoming unresponsive and that is when it becomes noticeable.
I am going to check on the maltrail issues and ask the devs what could be driving up the memory.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: ad1m on August 19, 2021, 12:34:08 am
I'm also following the github issue for maltrail to see if there are any solutions, but I'm already used to this OOM behavior. It isn't the first time this has happened to me with maltrail and OPNsense. Even in older versions of OPNsense(unfortunately I don't remember if it was running on python2 as requested on github) I would find the box consumed of all the RAM. That's why I'm used to disabling it and only testing to see if the issue has been corrected from time to time. For now the two proposed work-arounds seem to be using:
1. cronjob - to restart the maltrail sensor process
2. monit - to check the RAM consumption and do something like the cronjob

Let's hope for a quick fix  ;D
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: Leviathan on August 22, 2021, 12:37:33 pm
Hi guys! I’ve got the same! High swap and ram usage after the last upgrade
https://forum.opnsense.org/index.php?topic=24452.0
I hope it will be fixed soon  :'( !
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: ad1m on August 25, 2021, 11:02:35 am
I thought I should try maltrail with the work-around of daily restarting the maltrail sensor from cron but I have noticed two things.

1. In high traffic periods restarting only once per day wasn't enough. I had to modify the cron to run hourly.
2. After a couple of days I found multiple maltrail sensor orphaned processes which I had to kill manually from cli.


Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: sanxiago on August 27, 2021, 06:13:22 pm
After discussing on the maltrail page it appears the bug is with the python3 module for capturing traffic which is why it leaks based on traffic.

Reverting to python2.7 does not have this issue, I have confirmed this, and documented some of the steps below, I may have left some out but that is what I did:

# install ports tool not recommended for prod environments, I am just a home user that likes to cause trouble for himself.
opnsense-code ports tools
# Install python 2.7
cd /usr/ports/lang/python27
python2.7 -m ensurepip
# do not upgrade pip beyond version 21, it will break
pip2.7 install --upgrade "pip < 21.0"
# make sqlite 3 you need to set flavor to py27
cd /usr/ports/databases/py-sqlite3
make FLAVOR=py27 clean install
pip2.7 install sqlite3

# test if you have evrything you need from cli
 python2.7 /usr/local/share/maltrail/sensor.py

If everything works you can update the rc file/usr/local/etc/rc.d/opnsense-maltrailsensor to change the command to use python 2.7
command_args="-f -P /var/run/maltrailsensor.pid python2.7 /usr/local/share/maltrail/sensor.py"
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: Leviathan on August 31, 2021, 05:49:40 pm
I hope that this issue will be fixed from an official update really soon  :'( .
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: Leviathan on September 11, 2021, 03:46:51 pm
With the update to version 21.7.2_1 the extent of the problem has significantly decreased. Now the swap memory grows by about 1Gb every 3 days... anyway the problem remains and has not been solved...
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: mimugmail on September 11, 2021, 06:16:31 pm
It wont get fixed anytime soon as the pcapy lib is the root cause for it.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: bposter on September 14, 2021, 06:12:17 am
Steps I had to follow:

- Install ports
opnsense-code ports tools

- Install python 2.7
cd /usr/ports/lang/python27
make
make install
(log out and back in)
python2.7 -m ensurepip

- Do not upgrade pip beyond version 21, it will break
pip2.7 install --upgrade "pip < 21.0"

- Make sqlite 3 you need to set flavor to python 2.7
cd /usr/ports/databases/py-sqlite3
make FLAVOR=py27 clean install
pip2.7 install sqlite3

- Test if you have everything you need from cli
python2.7 /usr/local/share/maltrail/sensor.py

- Didnt't work, had to install pcapy
pip2.7 install pcapy-ng

- If everything works you can update the rc file/usr/local/etc/rc.d/opnsense-maltrailsensor to change the command to use python 2.7
command_args="-f -P /var/run/maltrailsensor.pid python2.7 /usr/local/share/maltrail/sensor.py"

Testing now, thanks sanxiago!
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: bposter on September 14, 2021, 06:16:38 am
Looks right so far, I'll watch memory for the next few days (and disable my processes restarts) and report back.

root     17187  86.1  7.6  997224 624956  -  R    22:13       0:05.59 python2.7 /usr/local/share/maltrail/sensor.py
root     13946   1.8  0.3   35580  23892  -  S    22:13       0:00.21 python3 /usr/local/share/maltrail/server.py (python3.8)
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: Leviathan on October 12, 2021, 11:09:57 pm
Still no official solution to this problem?
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: mimugmail on October 13, 2021, 06:16:08 am
No
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: Leviathan on November 06, 2021, 12:29:56 am
After 3 months still no solution... :-\
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: mimugmail on November 06, 2021, 09:21:34 am
You can try to build a port for pcapy-ng, then we can work on.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: Northguy on November 06, 2021, 02:07:32 pm
It wont get fixed anytime soon as the pcapy lib is the root cause for it.

If it is pcapy that is offending, you are right. it seems unmaintained (last release 02 Jul 2019). Although Stamparm himself has forked pcapy and is maintaining pcapy-ng. I would assume that this was done to implement improvements in maltrail?
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: opn_nwo on November 18, 2021, 09:43:52 pm
That's too bad. I had to disable the service on both my servers as it was getting too cumbersome to manually kill the runaway processes every day. Hopefully it will get fixed at some point.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: mimugmail on November 19, 2021, 06:56:29 am
It wont get fixed anytime soon as the pcapy lib is the root cause for it.

If it is pcapy that is offending, you are right. it seems unmaintained (last release 02 Jul 2019). Although Stamparm himself has forked pcapy and is maintaining pcapy-ng. I would assume that this was done to implement improvements in maltrail?

Yep, but this needs a FreeBSD port to integrate it. Sadly I dont have the time
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: javis on March 12, 2022, 08:14:03 am
It wont get fixed anytime soon as the pcapy lib is the root cause for it.

If it is pcapy that is offending, you are right. it seems unmaintained (last release 02 Jul 2019). Although Stamparm himself has forked pcapy and is maintaining pcapy-ng. I would assume that this was done to implement improvements in maltrail?

Yep, but this needs a FreeBSD port to integrate it. Sadly I dont have the time

Maltrail was almost completely unusable because of the memory leak. Unfortunately I have some company devices that generate crazy amount of telemetry requests, which greatly worsened the situation: 16GB of RAM and SWAP were completely filled up every half a day before my server box froze! Creating a cron job to restart the sensors actually accelerated the memory leak, because the old processes were simply orphaned rather than killed. This plugin in its current state is bound to crash your network sooner or later.

Thanks to the info in this thread, the fix is actually very simple (one doesn't have to install Python 2 to get it done) if people don't mind installing pip. Just follow Maltrail author's advice of using pcapy-ng (https://github.com/stamparm/maltrail/issues/16710#issuecomment-901440119), which simply added (https://github.com/stamparm/pcapy-ng/commits/master) the PY_SSIZE_T_CLEAN macro definition (https://docs.python.org/3/c-api/intro.html#include-files) to the abandoned pcapy codebase.

Code: [Select]
python3 -m ensurepip
pip3 install pcapy-ng

It's been a few days now, and my server box's memory usage has been very stable (around 30% ~ 60% without any swapping).

@mimugmail thank you for porting Maltrail to FreeBSD and creating the OPNsense plugin! If you don't have time bundling pcapy-ng with Maltrail (I'm not familiar with OPNsense development but it looks to me the plugin simply offers a web interface, and the actual Maltrail and its dependencies are pre-bundled into the system? How do folks go about submitting updates to them?), do you think you can revise the plugin description to include a short pcapy-ng installation guide? It shouldn't take one a couple days of crashing and researching to find out the culprit and the way to fix it :D
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: mimugmail on March 12, 2022, 08:04:34 pm
I already added a PR for pcapy-ng in December but noone with Commit bit merges it
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: javis on March 13, 2022, 09:51:57 pm
I already added a PR for pcapy-ng in December but noone with Commit bit merges it

I was trying to find your PR so did some digging (couldn't find it though), is this where OPNsense builds the Maltrail plugin: https://github.com/opnsense/ports/blob/91da3754f16f546456479e3a8790bccff33cf429/security/maltrail/Makefile#L11? Can we simply update this to depend on pcapy-ng or is there something else we need to do (like updating FreeBSD ports upstream (https://github.com/freebsd/freebsd-ports/tree/main/security/maltrail) etc)?
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: mimugmail on March 13, 2022, 09:59:27 pm
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260732
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: javis on March 13, 2022, 10:09:23 pm
Ah right, the GitHub repo is just a mirror, would need to submit a ticket on FreeBSD instead. Thx for bumping up that bug!
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: mimugmail on March 14, 2022, 08:14:17 am
Can you remind me in April again here? If its still not merged over there we can add it to opn mirror as an OPNsense port
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: javis on March 15, 2022, 03:09:57 am
Sure, I'll set a reminder for myself on 4th.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: javis on April 05, 2022, 03:40:47 am
Can you remind me in April again here? If its still not merged over there we can add it to opn mirror as an OPNsense port

Here's your friendly human alarm clock reporting that the FreeBSD ticket regarding pcapy-ng has been closed / fixed :) However, I started seeing the memory leak again recently so am not entirely sure if it's simply because pcapy-ng still can't work with Python 3 in its current state (although I see some fixes have been pushed to pcapy-ng since last conversation). I have disabled Maltrail for now and will check if the latest port, once available, fixes the memory issues.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: mimugmail on April 05, 2022, 08:37:14 am
22.1.5 will include it :)
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: k_mikhail on April 10, 2022, 09:49:23 pm
https://forum.opnsense.org/index.php?topic=27832.0 <-- not clear on it. Just missing respective info in release notes?
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: bposter on April 10, 2022, 09:51:02 pm
Thanks mimugmail, on 22.1.5 now. I'll report if my memory leak returns.
Title: Re: 21.7.1 Maltrail OOM / Possible fix
Post by: Hackintosys on June 10, 2022, 08:20:51 am
Hi, I still have a pretty high Swap usage.
Not as bad as before but its still there. Do you confirm it with your builds?