I randomly stop having internet due opnsense getting stuck - flowd_aggregate

Started by ikkeT, February 06, 2025, 04:27:55 PM

Previous topic - Next topic
Hi,

I have been experiencing this for quite long, but would now get to the roots of it. I installed telegraf, influxdb and grafana to see when and what starts going wrong. I see flowd_aggregate.py script at least keeps using lot cpu. But I can't find from logs what causes sudden memory usage, and raises cpu usage too. See grafana:

I didn't know where to put the image, as I can't upload it here, but see from mastodon: https://mementomori.social/@ikkeT/113957621410576425


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND                                                                                                                                     
99283 root          1 120    0    51M    38M CPU0     0  46.7H  99.05% python3.11                                                                                                                                   


root@OPNsense:~ # ps awfux|grep 99283
root     99283  83.5  0.9   52676  39024  -  Rs   24Jan25  2804:23.00 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.11)

Any ideas what could cause this, or how to find the problema from logs?

flowd_aggregate is the netflow service. Since it provides statistics about every single connection it's a memory and CPU intensive process - depending on your bandwidth and number of users of course. Do you actually use that data? If yes you might consider not aggregating the raw data on the firewall but sending them to an NMS like Elastiflow for example.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Sorry only now noticed your reply, and thanks. I have tried to disable the collection of them, and I recall it still hung. I will disable it again after the next memleak to verify again.