21.7.1 Maltrail OOM / Possible fix

Started by sanxiago, August 11, 2021, 08:44:26 PM

Previous topic - Next topic
Thank you all for the work into this project, I wanted to share the following.

I was having constant issues with maltrail sensor on the last releases, it was starving my box and causing swap and hangs.
I have a 16GB box with 4 cores ( 4 sensors get started )

I saw another reddit post few days ago and someone mentioned the upgrade would fix this as we would get maltrail 0.35

However the upgrade did not fix it for me.

I did the following and it seems to be running stable for the past couple of hours:
1. Set a max memory value for sensor of 900 MB by default it is supposed to use up to 10% of free mem
2. I used git pull to update the maltrail version in place to 0.36 ( I needed only to update the lists but it seems there were code changes too so I pulled in all changes, and kept my config)

I think the 1. st change is likely what fixed it, but if you continue having issues you can try and clone 0.36 and keep your conf file.

Looks like memory leak is not fixed today again the sensors were using several gigs of ram each, I am going to try to reproduce the issue on different environment, if anyone else is seeing the men leak please chime in.

Hello!

I seem to experience the same issue with similar hardware(4 cores, 16GB RAM), 1 Gbit PPPoE WAN and latest version of everything from the official Community repo.
I don't know what triggers it, sometimes it works fine for a couple of days and then it starts eating up all the memory. I tried different values for Maltrail - Sensor - Capture Buffer Size, but it doesn't seem to do anything regarding the problem. Luckly, OPNsense still works even if Maltrail consumes all RAM and SWAP, so I can connect(webGUI or ssh) and kill the sensor processes. I think I will disable Maltrail and wait for an update, since for my setup it's more of a nice to have, than a must.

I did not check the mem usage, but I noticed that the sensor stopped reporting to the server sometimes. The workaround was to create a cron job to restart the service once a day. I had no problem since.

Same problem for me, doesn't seem to matter what values I set. 4 cores, 8GB RAM - no performance problems until Mailtrail is enabled. It runs okay for a day or two, then suddenly starts consuming massive amounts of RAM and causing Suricata to crash.

Thank you for chiming in, at least I know it is not my doing. I see the same pattern I think it may be traffic driven, I also see the same thing with Suricata eventually becoming unresponsive and that is when it becomes noticeable.
I am going to check on the maltrail issues and ask the devs what could be driving up the memory.

I'm also following the github issue for maltrail to see if there are any solutions, but I'm already used to this OOM behavior. It isn't the first time this has happened to me with maltrail and OPNsense. Even in older versions of OPNsense(unfortunately I don't remember if it was running on python2 as requested on github) I would find the box consumed of all the RAM. That's why I'm used to disabling it and only testing to see if the issue has been corrected from time to time. For now the two proposed work-arounds seem to be using:
1. cronjob - to restart the maltrail sensor process
2. monit - to check the RAM consumption and do something like the cronjob

Let's hope for a quick fix  ;D

Hi guys! I've got the same! High swap and ram usage after the last upgrade
https://forum.opnsense.org/index.php?topic=24452.0
I hope it will be fixed soon  :'( !
A rainy day...

I thought I should try maltrail with the work-around of daily restarting the maltrail sensor from cron but I have noticed two things.

1. In high traffic periods restarting only once per day wasn't enough. I had to modify the cron to run hourly.
2. After a couple of days I found multiple maltrail sensor orphaned processes which I had to kill manually from cli.



After discussing on the maltrail page it appears the bug is with the python3 module for capturing traffic which is why it leaks based on traffic.

Reverting to python2.7 does not have this issue, I have confirmed this, and documented some of the steps below, I may have left some out but that is what I did:

# install ports tool not recommended for prod environments, I am just a home user that likes to cause trouble for himself.
opnsense-code ports tools
# Install python 2.7
cd /usr/ports/lang/python27
python2.7 -m ensurepip
# do not upgrade pip beyond version 21, it will break
pip2.7 install --upgrade "pip < 21.0"
# make sqlite 3 you need to set flavor to py27
cd /usr/ports/databases/py-sqlite3
make FLAVOR=py27 clean install
pip2.7 install sqlite3

# test if you have evrything you need from cli
python2.7 /usr/local/share/maltrail/sensor.py

If everything works you can update the rc file/usr/local/etc/rc.d/opnsense-maltrailsensor to change the command to use python 2.7
command_args="-f -P /var/run/maltrailsensor.pid python2.7 /usr/local/share/maltrail/sensor.py"

I hope that this issue will be fixed from an official update really soon  :'( .
A rainy day...

With the update to version 21.7.2_1 the extent of the problem has significantly decreased. Now the swap memory grows by about 1Gb every 3 days... anyway the problem remains and has not been solved...
A rainy day...

It wont get fixed anytime soon as the pcapy lib is the root cause for it.

September 14, 2021, 06:12:17 AM #13 Last Edit: September 14, 2021, 08:52:22 PM by bposter
Steps I had to follow:

- Install ports
opnsense-code ports tools

- Install python 2.7
cd /usr/ports/lang/python27
make
make install
(log out and back in)
python2.7 -m ensurepip

- Do not upgrade pip beyond version 21, it will break
pip2.7 install --upgrade "pip < 21.0"

- Make sqlite 3 you need to set flavor to python 2.7
cd /usr/ports/databases/py-sqlite3
make FLAVOR=py27 clean install
pip2.7 install sqlite3

- Test if you have everything you need from cli
python2.7 /usr/local/share/maltrail/sensor.py

- Didnt't work, had to install pcapy
pip2.7 install pcapy-ng

- If everything works you can update the rc file/usr/local/etc/rc.d/opnsense-maltrailsensor to change the command to use python 2.7
command_args="-f -P /var/run/maltrailsensor.pid python2.7 /usr/local/share/maltrail/sensor.py"

Testing now, thanks sanxiago!

Looks right so far, I'll watch memory for the next few days (and disable my processes restarts) and report back.

root     17187  86.1  7.6  997224 624956  -  R    22:13       0:05.59 python2.7 /usr/local/share/maltrail/sensor.py
root     13946   1.8  0.3   35580  23892  -  S    22:13       0:00.21 python3 /usr/local/share/maltrail/server.py (python3.8)