Crowdsec Daemon is stopping at 1am (sometimes)

Started by TimmiORG, March 08, 2024, 08:08:10 AM

Previous topic - Next topic
Hi All,

since a few weeks I noticed that the Crowdsec daemon is stopping / crashing at 1am (which should be UTC midnight).
I don't see anything in the crowdsec logs.

I'm not sure if this is happening since OPNsense 24 or if my IPv6 changes added additional load on the server. I would say the LAPI server is gone as I can see that the bounce is still trying to communicate.

Could it be that the local LAPI server is at the capacity limit?
Service is looking normal after starting it again.

Thank for your help
Timmi

I have the Same problem.
I couldn't see anything in the Logs, where do you see when the connection is broken? @TimmiORG
Protectli FW4B
Intel J6412 4 cores
4x Intel I225-V 2,5 Gbit/s
16 GB memory
480 GB m.2 SATA SSD storage
Coreboot

Hi,

I see in /var/log/crowdsec/crowdsec-firewall-bouncer.log

time="08-03-2024 01:00:52" level=error msg="auth-api: auth with api key failed return nil response, error: dial tcp 172.28.52.65:8080: i/o timeout"
time="08-03-2024 01:00:52" level=error msg="Get \"http://192.168.1.1:8080/v1/decisions/stream?\": dial tcp 192.168.1.1:8080: i/o timeout"



ok on my side its not the same but similar
time="08-03-2024 07:20:46" level=error msg="auth-api: auth with api key failed return nil response, error: dial tcp 127.0.0.1:8080: i/o timeout"
time="08-03-2024 07:20:46" level=error msg="Get \"http://127.0.0.1:8080/v1/decisions/stream?\": dial tcp 127.0.0.1:8080: i/o timeout"
Protectli FW4B
Intel J6412 4 cores
4x Intel I225-V 2,5 Gbit/s
16 GB memory
480 GB m.2 SATA SSD storage
Coreboot

@TimmiORG what is the actual output from
tail /var/log/crowdsec/crowdsec-firewall-bouncer.log ?

Protectli FW4B
Intel J6412 4 cores
4x Intel I225-V 2,5 Gbit/s
16 GB memory
480 GB m.2 SATA SSD storage
Coreboot

like this

time="08-03-2024 12:45:32" level=info msg="1 decision added"


This is what is shown if the system is running.

Hi,

it's not the bouncer logs that you should read, but crowdsec.log

Is there anything that points to a service failure?

I know.
As I wrote I don't see anything specific in the crowdsec.log at that time. Just no logs anymore at some point.
Only the bouncer log is showing that the LAPI is not available as I wrote.

Quote from: mmetc on March 08, 2024, 12:47:14 PM
Hi,

it's not the bouncer logs that you should read, but crowdsec.log

Is there anything that points to a service failure?

ok got it.

I think this could be the guilty guy?
time="2024-03-08T11:36:42+01:00" level=warning msg="sqlite is not using WAL mode, LAPI might become unresponsive when inserting the community blocklist"
Protectli FW4B
Intel J6412 4 cores
4x Intel I225-V 2,5 Gbit/s
16 GB memory
480 GB m.2 SATA SSD storage
Coreboot

Not for me as WAL mode is enabled.
I also don't receive the warning.

the best thing is I have that already activated ;)
Protectli FW4B
Intel J6412 4 cores
4x Intel I225-V 2,5 Gbit/s
16 GB memory
480 GB m.2 SATA SSD storage
Coreboot

I have created a Monit test to restart the service is it is not running.

So the service should be back within two minutes.

Quote from: TimmiORG on March 08, 2024, 04:10:11 PM
I have created a Monit test to restart the service is it is not running.

So the service should be back within two minutes.

Interesting, could you explain how you're doing that?
Protectli FW4B
Intel J6412 4 cores
4x Intel I225-V 2,5 Gbit/s
16 GB memory
480 GB m.2 SATA SSD storage
Coreboot

Sure, I assume Monit is running already.

Service Test Settings:
Name: Crowdsec_Service
Condition: failed host 127.0.0.1 port 8080 type tcp
Action: Restart

Service Settings:
Enable service checks: yes
Name: Crowdsec
Type: Process
PID File: /var/run/crowdsec.pud
Start: /usr/sbin/service crowdsec start
Stop: /usr/sbin/service crowdsec stop
Tests: Crowdsec_Service
Depends: Nothing selected
Description: Check that Crowdsec is running


Just chiming in that I am also seeing this on my end as well.  Crowdsec goes down every night now it seems. Going to look in the Monit advice from the prior posts in the meantime.


v24.1.3_1


tail /var/log/crowdsec/crowdsec.log


time="2024-03-08T12:14:31-05:00" level=info msg="Adding file /var/log/audit/latest.log to datasources" type=file
time="2024-03-08T12:14:31-05:00" level=info msg="Force add watch on /var/log/lighttpd" type=file
time="2024-03-08T12:14:31-05:00" level=info msg="Adding file /var/log/lighttpd/latest.log to datasources" type=file
time="2024-03-08T12:14:31-05:00" level=info msg="Force add watch on /var/log/filter" type=file
time="2024-03-08T12:14:31-05:00" level=info msg="Adding file /var/log/filter/latest.log to datasources" type=file
time="2024-03-08T12:14:31-05:00" level=info msg="Starting processing data"
time="2024-03-08T12:14:34-05:00" level=info msg="capi/community-blocklist : 0 explicit deletions"
time="2024-03-08T12:14:34-05:00" level=warning msg="sqlite is not using WAL mode, LAPI might become unresponsive when inserting the community blocklist"
time="2024-03-08T12:14:34-05:00" level=info msg="crowdsecurity/community-blocklist : added 15000 entries, deleted 14449 entries (alert:453)"
time="2024-03-08T12:14:34-05:00" level=info msg="Start pull from CrowdSec Central API (interval: 1h56m16s once, then 2h0m0s)"