Sensei on OPNsense - Application based filtering

Started by mb, August 25, 2018, 03:38:14 AM

Previous topic - Next topic
September 09, 2020, 03:55:33 PM #1065 Last Edit: September 09, 2020, 03:58:33 PM by r4nd0m
well, I have just tried installing it - resulting in a crash - so I replaced the kernel with the experimental kernel which boots and wants me to install but it only allows me to select vmx0

but here my interfaces:

LAN (vmx1)      -> v4: 192.168.x.x
WAN (pppoe0)   -> v4/PPPoE: x.x.x.x/32 (which is the vmx0 hardware interface)

so not sure why the interface mapping is incorrect? any ideas?

Hi @r4nd0m,

Yes, we are currently filtering out vmx/vtnet interfaces, because they cause OS to crash in netmap mode.

Stay tuned for 1.6, which is planned to be released this week/early next week. We enable these interfaces back; and instead of filtering out, you'll get a warning with a pointer to a netmap status page in case you're trying to use a problematic driver.

All these crash problems have been fixed in the test kernel, opnsense will be shortly shipping an official netmap kernel.

See here for the latest status: https://www.sunnyvalley.io/post/opnsense-kernel-netmap-status/




Quote from: mb on September 05, 2020, 06:07:11 AM
Hi GreenMatter,  sensei heartbeat is unrelated to this.

Netmap error messages make me think this is related to netmap.

We had seen a lot of progress on netmap side for the past month. I expect vmx support will also perform better than 20.1.x
And again it happened again the same (on 20.1.9 - I'm waiting for final netmap version) within a few days from the first occurrence: lost access to all internal vlan networks.
Quote
2020-09-09T04:31:07   kernel: 667.875025 [1180] netmap_grab_packets bad pkt at 390 len 0
2020-09-09T04:31:07   kernel: 667.875016 [1180] netmap_grab_packets bad pkt at 389 len 0
2020-09-09T04:31:07   kernel: 667.875008 [1180] netmap_grab_packets bad pkt at 388 len 0
2020-09-09T04:31:07   kernel: 667.875001 [1180] netmap_grab_packets bad pkt at 387 len 0
2020-09-09T04:31:07   kernel: 667.874992 [1180] netmap_grab_packets bad pkt at 386 len 0
2020-09-09T04:31:07   kernel: 667.874306 [ 277] vmxnet3_netmap_rxsync 130 skipped! idx 46
2020-09-09T04:31:07   kernel: vmx1: watchdog timeout on queue 0
2020-09-09T04:31:02   eastpect[8308]: nm1::vmx1^: permanently promiscuous mode enabled
2020-09-09T04:31:02   eastpect[8308]: nm0::vmx1: permanently promiscuous mode enabled
What surprising me is that all has been working fine for months, I had done no changes in setup, no new packages were installed and all of sudden this problem appears. I know it's net map but could it be triggered somehow by Sensei which inspects parent interface vmx1?
Shall I reinstall Sensei, would it help?
OPNsense on:
Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (4 cores)
8 GB RAM
50 GB HDD
and plenty of vlans ;-)

September 10, 2020, 01:13:22 AM #1068 Last Edit: September 10, 2020, 01:16:26 AM by r4nd0m
Quote from: mb on September 09, 2020, 05:49:00 PM
Hi @r4nd0m,

Yes, we are currently filtering out vmx/vtnet interfaces, because they cause OS to crash in netmap mode.

Stay tuned for 1.6, which is planned to be released this week/early next week. We enable these interfaces back; and instead of filtering out, you'll get a warning with a pointer to a netmap status page in case you're trying to use a problematic driver.

All these crash problems have been fixed in the test kernel, opnsense will be shortly shipping an official netmap kernel.

See here for the latest status: https://www.sunnyvalley.io/post/opnsense-kernel-netmap-status/

thanks for the heads-up so this is currently not applicable then for 1.5.2_1? https://help.sunnyvalley.io/hc/en-us/articles/360053347013-Deployment-Modes - I only see 2 modes Routed / Bridged ... Passive would be perfectly sufficient to test it out at the moment

Quote from: GreenMatter on September 09, 2020, 09:35:40 PM
Shall I reinstall Sensei, would it help?

Hi GreenMatter, I do not think this will be of help, since the problem is related to the kernel.

Are you able to start a new (test?) guest and see how the new test kernel is behaving?

Quote from: r4nd0m on September 10, 2020, 01:13:22 AM
thanks for the heads-up so this is currently not applicable then for 1.5.2_1? https://help.sunnyvalley.io/hc/en-us/articles/360053347013-Deployment-Modes - I only see 2 modes Routed / Bridged ... Passive would be perfectly sufficient to test it out at the moment

All welcome. Yes, 1.6 will re-enable them back. Passive mode is also introduced with 1.6. Stay tuned, almost there :)

Quote from: mb on September 10, 2020, 03:50:05 AM

Hi GreenMatter, I do not think this will be of help, since the problem is related to the kernel.

Are you able to start a new (test?) guest and see how the new test kernel is behaving?
No, I'm off premise and connect to Opnsense over VPN. I can't afford to demolish it  :D  remotely...
OPNsense on:
Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (4 cores)
8 GB RAM
50 GB HDD
and plenty of vlans ;-)

Quote from: GreenMatter on September 10, 2020, 04:31:36 AM
No, I'm off premise and connect to Opnsense over VPN. I can't afford to demolish it  :D  remotely...

Got it :) Unfortunately all our test systems are now running on 20.7 and testing new kernels... which makes it a bit harder to test a 20.1.x code. We'll give it a another look whenever we have a bit of time/resource.

September 15, 2020, 09:16:48 PM #1073 Last Edit: September 17, 2020, 03:40:37 PM by DenverTech
A few updates from my end. Strongly looking like a netmap issue yet. Cataloging as I go in case it helps others with their testing. The fact that bypass mode suffers the same fate as having Sensei enabled makes me think it's netmap not Sensei, but that's my uneducated opinion. I'm continuing to work with support.

Edit: On all of these, we've tested with both ix and igb drivers/nics.

Edit #2: Watching top -PCH, the load shows as eastpect and python3.7. Eastpect bounces between 50% and 99.97% utilization on any given core. Python3.7 bounces between 15% and 99.74%. Definitely feeling like something's running away with CPU time.

[20.1, Sensei 1.52, physical box, Xeon D-1528 CPU, 64gb memory] - ubench score ~160,000
- Sensei OFF: 900mbit
- Sensei BYPASS: 900mbit
- Sensei ON: 800mbit
- CPU never seems to show much load (<5% on average)

[20.1, Sensei 1.52, physical box, Xeon D-2123 CPU, 32gb memory] - ubench score ~180,000
- Sensei OFF: 900mbit
- Sensei BYPASS: 900mbit
- Sensei ON: 800mbit
- CPU never seems to show much load (<5% on average)

[20.7, Sensei 1.52, physical box, Xeon D-1528 CPU, 64gb memory] - ubench score ~160,000
- Sensei OFF: 900mbit
- Sensei BYPASS: 100mbit
- Sensei ON: 100mbit
- CPU load is VERY high at all times with Sensei (70%+)

[20.7, Sensei 1.6beta3, netmap test kernel, physical box, Xeon D-1528 CPU, 64gb memory] - ubench score ~160,000
- Sensei OFF: 850mbit
- Sensei BYPASS: 100mbit
- Sensei ON: 100mbit
- CPU load is VERY high at all times with Sensei (70%+)

*At this point, I was told that the D-1528 cannot handle inspection. It handled it fine on 20.1 and is one of the most recommended CPUs on all the vendor pages (D-1541 seems to beat it out slightly).

[20.7, Sensei 1.6beta3, netmap test kernel, virtual box, Xeon E5-2620 CPU (4 cores granted to the VM and 100% reserved), 64gb memory] - ubench score ~220,000
- Sensei OFF: 800mbit
- Sensei BYPASS: 100mbit
- Sensei ON: 100mbit
- CPU load is high at all times with Sensei (30-40%)

[20.7, Sensei 1.6beta3, netmap test kernel, virtual box, Xeon E5-2620 CPU (8 cores granted to the VM and 100% reserved), 64gb memory] - ubench score ~220,000
- Sensei OFF: 800mbit
- Sensei BYPASS: 200mbit (drops to 125mbit with 300-500 users)
- Sensei ON: 200mbit (drops to 60mbit with 300-500 users)
- CPU load is moderate at all times with Sensei (10-25%)

[20.7, Sensei 1.6b3, netmap test kernel, physical box, Xeon D-2123 CPU, 32gb memory] - ubench score ~180,000
- Sensei OFF: 900mbit
- Sensei BYPASS: 250mbit
- Sensei ON: 225mbit
- CPU load is high at all times with Sensei (40-50%)


*** EDIT 9/17/20 ***

Wow, the new kernel has made WORLDS of difference. The majority of our users aren't in yet, so I'm testing on a low number of people, but am already seeing a change.

[20.7.2, Sensei 1.6, netmap final kernel, virtual box, Xeon E5-2620 CPU (8 cores granted to the VM and 100% reserved), 64gb memory] - ubench score ~220,000
- Sensei OFF: 800mbit
- Sensei BYPASS: 750mbit (will update based on when people are using it heavily)
- Sensei ON: 750mbit
- CPU load is moderate at all times with Sensei (5-20%)
- This is a drop of 5% CPU utilization and an increase of 550mbit on our speedtests! WOW! I'll be curious to see what happens under load, but this is an amazing improvement regardless. Good job SunnyValley

Dear Sensei users,

As promised, long-awaited official netmap kernel fixing issues and bringing support for vpn and lagg interfaces:

https://forum.opnsense.org/index.php?topic=19175.0

PS: Sensei 1.6 will follow shortly.


September 17, 2020, 08:54:32 PM #1076 Last Edit: September 17, 2020, 11:06:24 PM by DenverTech
Rather than keep my already lengthy previous post going, I wanted to do a fresh one with the latest netmap kernel and the results we're seeing. Note that this is a virtual system and not ideal for this kind of load. We'll be swapping it out soon, but it's a good early benchmark.

*** <10 users ***
[20.7.2, Sensei 1.6, netmap final kernel, virtual box, Xeon E5-2620 CPU (8 cores granted to the VM and 100% reserved), 32gb memory] - ubench score ~220,000
- Sensei OFF: 800mbit
- Sensei BYPASS: 750mbit
- Sensei ON: 750mbit
- CPU load is moderate at all times with Sensei (5-20% with only 1-2 cores at the upper end at one time)
- This is a drop of 5% CPU utilization and an increase of 550mbit on our speedtests from previous kernel

*** ~800 users ***
[20.7.2, Sensei 1.6, netmap final kernel, virtual box, Xeon E5-2620 CPU (8 cores granted to the VM and 100% reserved), 32gb memory] - ubench score ~220,000
- Sensei OFF: 710mbit
- Sensei BYPASS: 625mbit
- Sensei ON: 590mbit
- CPU load is moderate with Sensei (20-30% on all cores)
- This is a drop of about 30% CPU utilization and an increase of 350mbit on our speedtests from previous kernel

I'm definitely liking what I'm seeing. It still won't run well on a D-series Xeon, but it looks to be a lot more usable on most other hardware. This was a huge improvement.

Thanks for Sensei 1.6., but reports via E-Mail are not working anymore:

[f26d747d-5635-45b5-8b14-103a9c50cc69] Script action failed with Command '/usr/local/opnsense/scripts/OPNsense/Sensei/report-gen/send.py --pdf 'false' --server 'xxxx' --port '587' --secured 'TLS' --username 'xxx' --password 'xxx' --sender 'xxx' --to 'xxx' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/processhandler.py", line 479, in execute stdout=output_stream, stderr=error_stream) File "/usr/local/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/OPNsense/Sensei/report-gen/send.py --pdf 'false' [...xxx...] returned non-zero exit status 1.



September 18, 2020, 01:15:26 PM #1078 Last Edit: September 18, 2020, 01:55:17 PM by mr.yx
Quote from: marcri on September 18, 2020, 12:25:01 PM
Thanks for Sensei 1.6., but reports via E-Mail are not working anymore:

[f26d747d-5635-45b5-8b14-103a9c50cc69] Script action failed with Command '/usr/local/opnsense/scripts/OPNsense/Sensei/report-gen/send.py --pdf 'false' --server 'xxxx' --port '587' --secured 'TLS' --username 'xxx' --password 'xxx' --sender 'xxx' --to 'xxx' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/processhandler.py", line 479, in execute stdout=output_stream, stderr=error_stream) File "/usr/local/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/OPNsense/Sensei/report-gen/send.py --pdf 'false' [...xxx...] returned non-zero exit status 1.

same for me, with a local mailsrv (without auth).

Good catch. I'm using external mailer and it doesn't work either. Since mine's weekly, I hadn't noticed, but just did a test send and it failed much the same.

Quote from: marcri on September 18, 2020, 12:25:01 PM
Thanks for Sensei 1.6., but reports via E-Mail are not working anymore:

[f26d747d-5635-45b5-8b14-103a9c50cc69] Script action failed with Command '/usr/local/opnsense/scripts/OPNsense/Sensei/report-gen/send.py --pdf 'false' --server 'xxxx' --port '587' --secured 'TLS' --username 'xxx' --password 'xxx' --sender 'xxx' --to 'xxx' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/processhandler.py", line 479, in execute stdout=output_stream, stderr=error_stream) File "/usr/local/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/OPNsense/Sensei/report-gen/send.py --pdf 'false' [...xxx...] returned non-zero exit status 1.