I've been using a remote elasticsearch server for awhile now. It's been pretty much problem free. Until the 21.1 update, now every night since that update, elasticsearch is pegged at 100% usage on all cores and unresponsive, requiring a force kill of the service to bring it back to life. See screenshot, seems to be close to the same time spot each night. Any idea's what might be running or changed in the 21.1 update that would be causing this to hang up?
@FullyBorked, Is the ELK instance running on OPNsense ?
Quote from: mb on February 02, 2021, 05:32:30 PM
@FullyBorked, Is the ELK instance running on OPNsense ?
No it's remote, running on Ubuntu server.
Anyone know how to troubleshoot what might be happening? Maybe enable some logging or something? Starting to get old fixing this server every day. :(
Setup a cron job to restart the elasticsearch service every morning at 3 am as (hopefully) a stop gap.
Hi FullyBorked,
How was it last night? Service restart worked or?
Quote from: sy on February 04, 2021, 04:45:44 PM
Hi FullyBorked,
How was it last night? Service restart worked or?
It does appear to have kept it from fully hanging up. Will monitor it a few more nights. Still like to know the root cause. Looked through logs on the elastic search server but saw nothing out of the ordinary.
Sent from my IN2025 using Tapatalk
Restarting the service appears to keep the service online. But in a weird state. I noticed this morning that it's like the service or the connection to it is flapping. Each refresh of the dashboard in opnsense gives different results. Sometimes it says service isn't running then next refresh it will be. Sometimes reports load and sometimes they throw errors. I don't know what happened after the 21.1 update but it's frustrating. Might have to rebuild it.
I ended up pulling this back to the firewall itself. Couldn't get it stable remotely. It's def something with the update as leaving the server online with no connection to the firewall didn't produce the hangs and cpu spike. Not sure what happened. Maybe I'll rebuild it on remote at a later date. Other than a lot of ram usage seems ok local.