Crowdsec Daemon is stopping at 1am (sometimes)

Started by TimmiORG, March 08, 2024, 08:08:10 AM

Previous topic - Next topic
Add me to the list, too. Preemptive thanks to whomever figures out a fix!

Mine isn't having this problem.
p.s. "level=warning msg="sqlite is not using WAL mode, LAPI might become unresponsive when inserting the community blocklist" seems to be only a warning. I used to get them and just set to WAL mode and the warning noise went away. That is what is for, telling you it will continue working but has a suggestion to improve.
This last snippet has no trace of a problem. So the question is why you think it is not running. Or more importantly, please keep looking in that log for other clues.

March 08, 2024, 07:23:32 PM #17 Last Edit: March 08, 2024, 07:35:46 PM by fuzelet
I can see the dashboard service status every day turn to a red play button. upon clicking it it starts back up and runs fine for that day. 

not sure what logs i can also check, but like others have said, i cant seem to find it crashing in any logs. it just turns off until i start it again.


/var/log/crowdsec/crowdsec.log
time="2024-03-08T01:19:14-05:00" level=error msg="Failed to fetch network for 194.26.135.250 : the MaxMind DB file's data section contains bad data (float 64 size of 19)" id=morning-snow method=IpToRange name=crowdsecurity/geoip-enrich stage=s02-enrich
time="2024-03-08T01:19:34-05:00" level=error msg="Unable to enrich ip '167.94.145.90'" id=morning-snow method=GeoIpASN name=crowdsecurity/geoip-enrich stage=s02-enrich
time="2024-03-08T01:19:34-05:00" level=error msg="Failed to fetch network for 167.94.145.90 : unexpected type when decoding string: 79" id=morning-snow method=IpToRange name=crowdsecurity/geoip-enrich stage=s02-enrich
time="2024-03-08T01:19:51-05:00" level=error msg="Unable to enrich ip '109.205.213.22'" id=morning-snow method=GeoIpASN name=crowdsecurity/geoip-enrich stage=s02-enrich
time="2024-03-08T01:19:51-05:00" level=error msg="Failed to fetch network for 109.205.213.22 : the MaxMind DB file's data section contains bad data (float 64 size of 20)" id=morning-snow method=IpToRange name=crowdsecurity/geoip-enrich stage=s02-enrich
time="2024-03-08T01:20:06-05:00" level=error msg="Unable to enrich ip '109.205.213.22'" id=morning-snow method=GeoIpASN name=crowdsecurity/geoip-enrich stage=s02-enrich
time="2024-03-08T01:20:06-05:00" level=error msg="Failed to fetch network for 109.205.213.22 : the MaxMind DB file's data section contains bad data (float 64 size of 20)" id=morning-snow method=IpToRange name=crowdsecurity/geoip-enrich stage=s02-enrich
time="2024-03-08T12:14:30-05:00" level=warning msg="You are using sqlite without WAL, this can have a performance impact. If you do not store the database in a network share, set db_config.use_wal to true. Set explicitly to false to disable this warning."
time="2024-03-08T12:14:30-05:00" level=info msg="Enabled feature flags: <none>"
time="2024-03-08T12:14:30-05:00" level=info msg="Crowdsec v1.6.0-freebsd-4b8e6cd7"
time="2024-03-08T12:14:30-05:00" level=info msg="Loading prometheus collectors"
time="2024-03-08T12:14:30-05:00" level=info msg="Loading CAPI manager"
time="2024-03-08T12:14:30-05:00" level=info msg="flushed 6/33 alerts because they were created 7d ago or more"
time="2024-03-08T12:14:31-05:00" level=info msg="CAPI manager configured successfully"
time="2024-03-08T12:14:31-05:00" level=error msg="Machine is not enrolled in the console, can't synchronize with the console"
time="2024-03-08T12:14:31-05:00" level=info msg="Start push to CrowdSec Central API (interval: 11s once, then 10s)"
time="2024-03-08T12:14:31-05:00" level=info msg="CrowdSec Local API listening on 127.0.0.1:8080"
time="2024-03-08T12:14:31-05:00" level=info msg="Start sending metrics to CrowdSec Central API (interval: 17m52s once, then 30m0s)"
time="2024-03-08T12:14:31-05:00" level=info msg="capi metrics: sending"
time="2024-03-08T12:14:31-05:00" level=info msg="Loading grok library /usr/local/etc/crowdsec/patterns"
time="2024-03-08T12:14:31-05:00" level=info msg="Starting community-blocklist update"



/var/log/crowdsec/crowdsec_api.log

time="2024-03-08T01:19:21-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 01:19:21 EST] \"GET /v1/decisions/stream HTTP/1.1 200 19.186703ms \"crowdsec-firewall-bouncer/v0.0.28-freebsd-af6e7e2\" \""
time="2024-03-08T01:19:31-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 01:19:31 EST] \"GET /v1/decisions/stream HTTP/1.1 200 20.377403ms \"crowdsec-firewall-bouncer/v0.0.28-freebsd-af6e7e2\" \""
time="2024-03-08T01:19:41-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 01:19:41 EST] \"GET /v1/decisions/stream HTTP/1.1 200 19.258695ms \"crowdsec-firewall-bouncer/v0.0.28-freebsd-af6e7e2\" \""
time="2024-03-08T01:19:51-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 01:19:51 EST] \"GET /v1/decisions/stream HTTP/1.1 200 39.013967ms \"crowdsec-firewall-bouncer/v0.0.28-freebsd-af6e7e2\" \""
time="2024-03-08T01:20:01-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 01:20:01 EST] \"GET /v1/decisions/stream HTTP/1.1 200 25.659197ms \"crowdsec-firewall-bouncer/v0.0.28-freebsd-af6e7e2\" \""
time="2024-03-08T12:14:31-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 12:14:31 EST] \"POST /v1/watchers/login HTTP/1.1 200 54.670453ms \"crowdsec/v1.6.0-freebsd-4b8e6cd7\" \""
time="2024-03-08T12:14:45-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 12:14:45 EST] \"GET /v1/decisions/stream HTTP/1.1 200 224.060551ms \"crowdsec-firewall-bouncer/v0.0.28-freebsd-af6e7e2\" \""
time="2024-03-08T12:14:45-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 12:14:45 EST] \"GET /v1/decisions/stream HTTP/1.1 200 15.971222ms \"crowdsec-firewall-bouncer/v0.0.28-freebsd-af6e7e2\" \""
time="2024-03-08T12:14:50-05:00" level=info msg="127.0.0.1 - [Fri, 08 Mar 2024 12:14:50 EST] \"GET /v1/decisions/stream HTTP/1.1 200 14.849763ms \"crowdsec-firewall-bouncer/v0.0.28-freebsd-af6e7e2\" \""

I've been seeing this as well.  I thought it coorisponded with my CRON job that runs, "Update and reload firewall aliases" every night at 1:07am, but maybe it has nothing to do with that?
Topton 4 x i225-v (Core i5-1135G7 * 32GB * 512SSD)
Xfinity Gigabit (1.2G Down * 200M Up)

me too, before 24.2 and crowdsec 1.6 it was rock stable

now every day , dashboard show red and log  show not anything , restart it runs smooth.


I've had to spend most of the weekend fixing my network for other reasons.
Those error messages seem pretty serious and seems MaxMid's database is in a different to the expected. As to what changed would be a guess. Can be either maxmind or crowdsec.
You could try disabling the enrich part whilst the problem is investigated. It sure looks like needs reporting also on their side, in case this forum isn't monitored much.

Quote from: cookiemonster on March 11, 2024, 02:43:41 PM
I've had to spend most of the weekend fixing my network for other reasons.
Those error messages seem pretty serious and seems MaxMid's database is in a different to the expected. As to what changed would be a guess. Can be either maxmind or crowdsec.

Hi, I'm the author of the opnsense plugin. A new version of the geoip database had issues with the current crowdsec and we reverted to the older version. Hub upgrade (manually or from cron) fixes it, and I don't think it could crash the service. I am looking into the issue. Thanks!

March 12, 2024, 10:28:35 PM #22 Last Edit: March 12, 2024, 10:30:36 PM by whezzel
I'm also having this issue. I received an email from Maxmind yesterday stating they would be switching to R2 presigned URLs for all DBs, as of May 1st, and that it is a potential breaking change. Not sure if this is related to the issue we are facing but I figured I would mention it.

I tried running "cscli hub upgrade --force" on both of my routers and they fail on the "crowdsecurity/geoip-enrich" list.

Quote from: whezzel on March 12, 2024, 10:28:35 PM
I'm also having this issue. I received an email from Maxmind yesterday stating they would be switching to R2 presigned URLs for all DBs, as of May 1st, and that it is a potential breaking change. Not sure if this is related to the issue we are facing but I figured I would mention it.

I tried running "cscli hub upgrade --force" on both of my routers and they fail on the "crowdsecurity/geoip-enrich" list.

Did you run "cscli hub update" first?

I could not replicate the issue, but it would help if you ran "cscli support dump" and send the resulting file to support@crowdsec.net

Thanks!


March 15, 2024, 12:42:38 PM #25 Last Edit: March 15, 2024, 02:59:49 PM by meyergru
Subscribed, since I regularly have crowdsec stop - for whatever reason and I cannot tell at what time. Have sent the crowdsec-support file, but I doubt that it reveals much (e.g. there is no crashdump in there).
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

I never noticed this - just to add a data point - but experience a different way of crowdsec to occasionally stop:

https://github.com/crowdsecurity/crowdsec/issues/2902
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I meant: crowdsec sometimes stops - I cannot even tell at what time, much less what is the cause.

Log rotation and a resulting crash might well be it, however I just have reset my log files and that did not cause crowdsec to stop.

Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Yes crowdsec would inappropriately raise an error if a watched file disappears immediately after the initial directory scan
This will be corrected for 1.6.1, but I'm not sure how often it occurs.
More generally, a process exit by crowdsec could be due to CAPI being unavailable for a long time or other issues.

On the linux package any transient exit/crash is not a problem, expect for the possible underlying bug, since the process is restarted immediately by systemd (or docker). For freebsd there is no - afaik - general consensus on how to restart crashed processes.

Monit is a good solution but it's not available on freebsd by default or in pfsense. I tried simply adding a restart option to the sbin/daemon wrapper, it's not working as expected but I'd prefer the solution should be the same for the three platforms.

If someone is using monit to restart crowdsec, can you share that part of configuration?

Thanks



Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)