Sensei on OPNsense - Application based filtering

Started by mb, August 25, 2018, 03:38:14 AM

Previous topic - Next topic
Not in our CARP HA cluster. We have 12 chelsio ports, so sensei needs to run with it.
Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz (24 cores)
256 GB RAM, 300GB RAID1, 3x4 10G Chelsio T540-CO-SR

May 29, 2019, 05:33:35 PM #331 Last Edit: May 29, 2019, 05:44:47 PM by donatom3
Quote from: mb on May 29, 2019, 02:10:25 AM
Dear Sensei users,

Sensei 0.8.0 Release Candidate 1 is out. This marks the first step into releasing 0.8 and towards 1.0. There will be no 0.9 :)

Change log is as follows:

  • Per-process health monitoring. Sensei engine now checks heartbeats from its packet processors and taking the corrective action in case of trouble.

We're running 0.7 to 0.8 upgrade tests. As soon as they show that we're good to go, 0.7 users will be reported of the new 0.8 update.

Enjoy :)

Sensei team

@mb Just checking if that is the fix we were talking about to the issue I was seeing with Sensei/netmap crashing causing all traffic to stop until I rebooted the whole firewall.

The last times it happened restarting Sensei from the GUI did not let traffic resume. I had to restart the whole firewall with the auto start of the packet engine turned off.

I did the upgrade to rc1 yesterday so I'll let you know if I still see the issue.

Hello @mb.

Yes, I can confirm the fix in rc1 did resolve the error I saw with the Sensei CLI API and OPNsense Crash Reporter.

Thank you!

Great to hear that @JohnDoe17, thanks for letting us know.

@donatom3 hi,

Yes, it's also netmap related but a different issue. After many trials, I was able to reproduce your situation. Doing a ifconfig down/up seem to resolve the problem.

After Sensei 1.0, we'll have another dive at netmap. It's a great tool, but certainly needs some industry help to get to a super stable state.

Quote from: patcsy88 on May 29, 2019, 01:42:31 PM
Just reinstalled OPNsense and the RC1 on APU2C4 with 2GB Swap - so far so good!

So Sensei detected high Swap usage over the last 10+ hours and shut itself down. On prompt, I restarted ES. I have now also disabled the Health Check and on the Configuration page started Sensei Packet Engine and the overlay on the page says it is waiting for the service to startup. After 10 or so minutes, nothing happens on the page but vmstat in a shell suggest it is back up. Refreshing the OPNsense page and then going to the Configuration page again shows Sensei is up and running. Not sure if it is the OPNsense framework or Sensei page that is not polling for refresh of content/data...


May 30, 2019, 03:40:04 AM #335 Last Edit: May 30, 2019, 04:06:04 AM by donatom3
Quote from: mb on May 29, 2019, 07:31:30 PM
Great to hear that @JohnDoe17, thanks for letting us know.

@donatom3 hi,

Yes, it's also netmap related but a different issue. After many trials, I was able to reproduce your situation. Doing a ifconfig down/up seem to resolve the problem.

After Sensei 1.0, we'll have another dive at netmap. It's a great tool, but certainly needs some industry help to get to a super stable state.

@MB

I believe I just had one of the crashes again but looks like it reconnected on it's own. I noticed it while browsing my apple tv that streaming stopped working and my harmony showed it was offline then was online a few seconds later. This was in the main log file

2019-05-29T18:28:37 ERROR: Watchdog: Worker [0] failed to send heartbeat for 6 seconds
2019-05-29T18:28:37 ERROR: Watchdog: Killing Worker [0]
2019-05-29T18:28:37 CRITICAL: Sending TERM signal to worker pid 98083
2019-05-29T18:28:38 CRITICAL: WaitWorkers: processing dead child: pid: 98083
2019-05-29T18:28:38 CRITICAL: WaitWorkers: Child worker0, [pid: 98083] terminated with signal: 11
2019-05-29T18:28:38 CRITICAL: WaitWorkers: Child worker0, [new pid: 60913] re-spawned


And here is the matching time stamp from the worker log.

2019-05-29T18:28:38 INFO: Packet Processor [60913] started working
2019-05-29T18:28:38 INFO: Packet Processor [60913] sleeping a while since we're respawned
2019-05-29T18:28:50 INFO: Worker [pid:60913] Pinning to CPU #1
2019-05-29T18:28:50 INFO: Worker [60913] started working



If this was your fix it did it's job very fast. I wouldn't have noticed it unless I was doing some realtime traffic

Quote from: patcsy88 on May 29, 2019, 01:42:31 PM
...overlay on the page says it is waiting for the service to startup. After 10 or so minutes, nothing happens on the page but vmstat in a shell suggest it is back up. Refreshing the OPNsense page and then going to the Configuration page again shows Sensei is up and running. Not sure if it is the OPNsense framework or Sensei page that is not polling for refresh of content/data...

@patcsy88, we have been reported a similar case. Now, it looks like, if the system is under load and not responsive enough, Sensei UI might be waiting for the response for a long time.

Thanks for your input, this would be helpful in diagnosing the root cause.

One question: I guess you have like 4 GB of memory. For how many devices are you running Sensei for?

Quote from: donatom3 on May 30, 2019, 03:40:04 AM
If this was your fix it did it's job very fast. I wouldn't have noticed it unless I was doing some realtime traffic

Hi @donatom3, yes, chances are high that it might be fixing yours.

We implemented the heartbeat mechanism for any cases where packet engine might hang for more than 5 seconds.

If the main process senses that the packet processor process is not feeling well enough, it simply restarts the process.

This is to keep network availability high in case anything goes wrong.

May 30, 2019, 04:32:17 PM #338 Last Edit: May 30, 2019, 04:34:19 PM by patcsy88
Quote from: mb on May 30, 2019, 04:22:00 PM
One question: I guess you have like 4 GB of memory. For how many devices are you running Sensei for?

@MB only 4 devices with normal web browsing

Quote from: patcsy88 on May 30, 2019, 04:32:17 PM
@MB only 4 devices with normal web browsing

@patcsy88, what does the following tell?

cat /usr/local/libexec/elasticsearch/config/jvm.options  | grep "^\-Xm"
ps awxu | grep elastic | grep -v grep
ps awxu | grep eastpect | grep -v grep


Quote from: mb on May 30, 2019, 04:38:27 PM

@patcsy88, what does the following tell?

cat /usr/local/libexec/elasticsearch/config/jvm.options  | grep "^\-Xm"
-Xms2g
-Xmx2g

ps awxu | grep elastic | grep -v grep
elasticsearch  4875   2.2 46.6 3878304 1927928  -  I    08:22     74:00.13 /usr/local/openjdk8/bin/java -Xms2g -Xmx2g -XX:+UseConcM

ps awxu | grep eastpect | grep -v grep
root           7417   0.5  4.5 3094852  185100  -  S<   08:35      8:29.81 eastpect: Eastpect Instance 0 (eastpect)
root          66470   0.0  0.0 1270428       0  -  IW<  -          0:00.00 eastpect: Eastpect Streamer Instance (eastpect)
root          80093   0.0  2.2 1270428   92760  -  S<   08:35      0:04.70 /usr/local/sensei//bin/eastpect -D


Quote from: patcsy88 on May 30, 2019, 04:44:24 PM
cat /usr/local/libexec/elasticsearch/config/jvm.options  | grep "^\-Xm"
-Xms2g
-Xmx2g


There it is. Edit this file, change these line to read:

-Xms512m
-Xmx512m


and stop/start elasticsearch service. You should be good to go.

For fresh installs we adjust this setting. Any chances you had a prior Sensei installation in this device?


Quote from: mb on May 30, 2019, 04:50:50 PM


For fresh installs we adjust this setting. Any chances you had a prior Sensei installation in this device?

No it was a fresh install!

@patcsy88, got it. We'll have a check for that whenever sensei is update/installed.

How is the system doing after you adjusted Elastic memory?

May 31, 2019, 11:01:39 AM #344 Last Edit: May 31, 2019, 11:15:41 AM by alelnr
Hi All,
in our environment OPNsense 19.1.8 + Sensei 0.7, sensei cloud reputation is completely blocking OPNsense unbound DNS service. To allow unbound dns answer to queries on sensei protected interfaces, i had to disable cloud reputation service.
Thank you