OPNsense Forum

Archive => 17.1 Legacy Series => Topic started by: phoenix on December 29, 2016, 01:47:36 pm

Title: 17.1.b & Suricata fails on ESXi
Post by: phoenix on December 29, 2016, 01:47:36 pm
I've been trying for a few days to get the 17.1 beta working with Suricata, as soon as I enable the service OPNsense collapses and the console goes to a "db>" prompt. Unfortunately at this point I don't really know what to do other than reboot, when I do that the console shows errors with the HD and repairs those. I've tried downloading the rules a couple of times, first without enabling any of them and also enabling some of them - then I activate the service and it fails in both of those tests.

This is a VM running on ESXi v6.0, I should point out that doing the same configuration on VMworkstation 12.5.2 it all works as it should - Suricata can be enabled and rules downloaded without problems. If you want/need any further information or logs point me in the right direction and I'll provide what I can. :)

P.S. I did try this on ESXi with the EFI  bios setting and it still failed.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: weust on December 29, 2016, 02:47:02 pm
What type of NIC are you using for that VM?
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on December 29, 2016, 02:51:30 pm
I use Intel NICs in all my machines, I'm not sure which model in this particular server but it's a server NIC and I'm using the VMXNET3 VMware NIC. As I mentioned, this worked fine in VMware Workstation so I was assuming it would be OK on ESXi. I also forgot to mention that all the NIC offload setting are disabled as well.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: lattera on December 29, 2016, 02:59:35 pm
Can you give us a screenshot? Also, type in "bt" (without the quotes), then hit enter. And then take another screenshot.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on December 29, 2016, 03:01:26 pm
I'll give that go a bit later today if that's OK?
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: lattera on December 29, 2016, 03:03:09 pm
Sure. Whenever's most convenient for you. Thanks!
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on December 29, 2016, 03:25:06 pm
Actually I just tried something else and it's activating the IPS mode that causes the problem, the original setting already had  that activated but not the service Enabled. Here they are, sooner than I thought:

Title: Re: 17.1.b & Suricata fails on ESXi
Post by: lattera on December 29, 2016, 03:42:41 pm
This seems to be related to a problem I had a while ago with netmap. While at EuroBSDcon, I talked with the original developer behind netmap and the problem is now fixed in his out-of-tree project. It has been merged into 12-CURRENT. It hasn't been backported to 11-STABLE (and thus is not in 11.0-RELEASE).

I'll email the original developer just to make sure this is the same issue that I saw. If you want me to include you on the email, could you shoot me an email at shawn.webb@hardenedbsd.org?
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on December 29, 2016, 03:44:58 pm
The technical details of this problem will be way above my head so no need to include me in the email, could you just give me a follow-up when you get an answer? Many thanks for your time and help. :)
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on December 29, 2016, 10:25:25 pm
Looks like a problem with the netmap "generic" emulation layer because vmx does not have native netmap support. Does this also happen with e1000 emulated drivers?
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on December 29, 2016, 10:30:56 pm
This should be the commit Shawn talked about, but it's not on stable/11 yet.

https://github.com/freebsd/freebsd/commit/cdb805690

This won't make it into 17.1 images for sure.


Cheers,
Franco
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: lattera on December 30, 2016, 06:28:20 am
That's exactly it. Netmap had a major overhaul in 11.0, but that overhaul caused issues due to lack of testing with various "non-standard" hardware. The commit you linked to contains a whole lot of work, including more stable and robust code.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on December 30, 2016, 07:50:46 am
If it applies cleanly we can talk about adding it in an 17.1.x :)
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on December 30, 2016, 08:14:52 am
So no IDS until this is incorporated, I guess? It's not a great problem for me so I'm going to leave the 17.1 version up on my VM, if there's any testing you need for this fix I can give it a go. :)
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on December 30, 2016, 04:51:59 pm
Hi Bill,

The e1000 emulation should work.

I don't feel good about taking the commit (and the fixes for the commit that went in afterwards) without an official MFC to the FreeBSD 11 stable branch, so I cannot even provide a test kernel at the moment.


Cheers,
Franco
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on December 30, 2016, 07:12:16 pm
Hi Franco

Thanks for that prod, I'd forgotten about testing the E1000 NIC - the obvious sometimes escapes me. I did try the VMXNET2 NIC as well and that also failed to allow IDS enabling but I guess that's to be expected.

I should point out to anyone else that tries this, you can't leave the VMXNETx in the system, it has to be a removal and change to the E1000 NIC then a clean install of 17.1.b and then it works a treat with IDS up and running smoothly.

Thanks for your help and I wish you and the OPNsense team (and the other forum members) a happy and prosperous New Year, have a great week-end. :)
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: lattera on December 30, 2016, 07:14:31 pm
Great to hear you've gotten it working with the emulated Intel driver. That confirms that it's the same issue that I saw and should be fixed with the patch Franco linked to.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on December 31, 2016, 03:20:37 pm
There's an unfortunate side effect of this, the CPU usage goes up to 100% and the Load is 1.3%. Using the VMXNET3 driver on 16.7 the Load was about the same with CPU usage around the 12% mark. This is a VM on a lightly loaded server so I'll leave it as it is for now and keep an eye on it.

Would it be worth mentioning this problem in the Release Notes for 17.1 (and the RCs?) just in case anyone else hits this problem.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on December 31, 2016, 03:57:06 pm
Will do. :)
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on January 03, 2017, 05:38:17 pm
I ran into this with the intel-em-kmod driver we maintain, it surprisingly (but not unjustly) uses the netmap(4) emulation mode as opposed to its native support, which made it possible to easily run into the same panic. First test with the new netmap(4) changes in 12-CURRENT had no conclusive results. We're definitely not going to solve this for the initial 17.1 release, but I will work with the authors to see if we can resolve this ASAP to port it over.


Cheers,
Franco
--

775.468651 [ 268] generic_find_num_desc     called, in tx 1024 rx 1024
775.476185 [ 276] generic_find_num_queues   called, in txq 0 rxq 0
775.483286 [ 801] generic_netmap_dtor       Restored native NA 0
775.496255 [ 268] generic_find_num_desc     called, in tx 1024 rx 1024
775.503779 [ 276] generic_find_num_queues   called, in txq 0 rxq 0
775.511347 [ 801] generic_netmap_dtor       Restored native NA 0
775.527056 [ 276] generic_find_num_queues   called, in txq 0 rxq 0
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x1
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80732c2a
stack pointer           = 0x28:0xfffffe00a17cb300
frame pointer           = 0x28:0xfffffe00a17cb350
code segment            = base 0x0, limit 0xfffff, type 0x1b
                       = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 80820 (W#01-em1+)
[ thread pid 80820 tid 100213 ]
Stopped at      generic_xmit_frame+0x2a:        movl    (%rax),%eax
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on January 07, 2017, 05:12:18 pm
Bill,

I looked into this all the way up to involving FreeBSD/netmap people.

The good news is: the panic is gone in code in 12-CURRENT and we have a working backport.

The bad news for now: neither 12-CURRENT nor the backport for 17.1 work in our inline IPS setup with Suricata.

I'll drop by again when we have more info.


Cheers,
Franco
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 07, 2017, 05:48:56 pm
Hi Franco

Thanks for both of those updates, I seem to have missed the post on Jan 3rd.  It's not an urgent problem for me so I reverted to using the VMXNET3 NICs so I could drop the cpu usage and stay on the 17.1 beta. I'm quite happy to leave Suricata disabled for now and I'll wait for any updates you get on this, I'll also be willing to be a guinea pig if you need it tested. :)

Thanks for all you hard work on this and a Happy new Year to you and all the team.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on January 07, 2017, 06:17:45 pm
Hi Bill,

A happy new year to you too! :)

The issue is a bit problematic as it is largely present FreeBSD 11.0 but was working in 10.3 just fine. It unfortunately points to "us" being a major provider/user of the functionality, actually only a small subset or niche feature of what others are *not* directly using, not even the developers themselves. This comes with mixed implications of having to make sure the features we use are not being deleted as unused or silently broken months before they are released.

I don't know how we can pull this off, but hopefully with the current discussions we will find a way in the next weeks.


Cheers,
Franco
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on January 15, 2017, 05:43:24 pm
How about this kernel then? Make sure to snapshot. :)

# opnsense-update -kr 17.1.b-netmap-fix


Cheers,
Franco
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 15, 2017, 06:12:33 pm
Gosh, that was quick. :)

I (almost) always take a snapshot and I did today. Just done the update and after enabling IPS/IDS and updating the rules all seems to be quite calm with a normal relatively low CPU usage - I also have this on a VM with the VMXNET3 NICs installed. If there's anything that breaks or looks out of place I'll post here.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on January 15, 2017, 06:16:28 pm
Quick? Took me a couple of days to dig through 2 years of netmap commit history to find it. :D

That's a good sign. If the guys at Deciso and the netmap peeps are ok with it I shall add the fix just in time for 17.1-RC1.


Cheers,
Franco
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 15, 2017, 06:24:47 pm
Sounds good to me, I'll keep a close eye on it for the moment and see what happens. Without IDS enabled it's been running at about 2-3% cpu usage and with it it seems to be hovering around 7-8% and obviously there was a larger spike to 10-12% as the rules were downloaded but that dropped after a few minutes.

Thanks for all your hard work on this and enjoy the rest of the evening. :)
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on January 15, 2017, 06:40:32 pm
Thank you Bill, you too!


Cheers,
Franco
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 16, 2017, 08:03:09 am
Good morning Franco

Bad news 'm afraid. A short while after updating the install yesterday the CPU usage went up to 100%. I didn't notice this yesterday evening as internet access was still OK but this morning I saw the cpu usage was up and internet access was almost impossible.

A reboot also had problems with various timeouts and I had to reset the VM to get it to boot correctly, that worked but CPU usage was straight up to 100%. - disabling IDS/IPS and resetting the VM doesn't resolve the 100% CPU problem and it runs like that all the time.

I've taken a snapshot of this current system so if you need me to do anything on that to get you some logs then let me know.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 16, 2017, 09:24:33 pm
I've just been doing some testing with this and the high CPU use may not be a problem with IPS/IDS. I've enabled IPS/IDS again with the updated kernel/drivers and I'll leave it for tonight and do some  more test in the morning, I'll post the results later tomorrow.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 17, 2017, 08:42:48 am
I've left this running overnight and there's no sign of any high CPU use with IDS/IPS enabled and your patch also installed.

What I did do was disable the reporting and SNMP enabled but no SNMP modules activated. I then activated the SNMP Modules one by one and the one that caused the high CPU is enabling the Host Resources module, as soon as I did that the CPU usage went up to 100% and obviously dropped as soon as it was disabled. If it makes any difference the SNMP service is only bound to the LAN interface.

As an additional not, there seems to be a problem doing a reboot, it restarts but seems to have problems checking devices (I think that's where it hung) and I have to reset the VM and it then comes up OK. What information would you need about this problem?
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on January 18, 2017, 02:37:59 pm
Let's start with a console screenshot when its supposedly stuck?


Thanks,
Franco
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 18, 2017, 03:18:57 pm
Hi Franco

I'll do the reboot shortly and take a screenshot. The screenshot will probably be too large to post here, should I send it via PM?

Meanwhile, I'm also seeing some scsi write errors (in the attached screenshot), are they anything to be concerned about?
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 18, 2017, 03:55:21 pm
Here's the last image I took of the server hanging, this was after (about) eight minutes of it producing those type of messages - I have earlier shots if you need to see them:

https://i.imgsafe.org/f80f85d772.png
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: phoenix on January 20, 2017, 02:02:16 pm
Here's the last image I took of the server hanging, this was after (about) eight minutes of it producing those type of messages - I have earlier shots if you need to see them:

https://i.imgsafe.org/f80f85d772.png
Hi Franco

I think we can forget this 'reboot problem' - I've installed the rc1 version today (clean install and settings import) and this problem has gone away, it reboots fine from the GUI menu.

I'm also assuming that you added to rc1 the test kernel that I tried  as the CPU load when Suricate is enabled is still low and IPS/IDS works fine for me.
Title: Re: 17.1.b & Suricata fails on ESXi
Post by: franco on January 20, 2017, 03:20:51 pm
Hi Bill,

The CAM error can happen because VMware emulation isn't 100% bug free, but I don't think this is a data corruption, just a "hardware" error that can be recovered from.

Nice to hear about RC1 though it's weird that it would hang right before the kernel yields the system to init (bright white vs. grey). This shouldn't have happened and there is no reason the problem disappeared, because no code that would be responsible for the transition changed. Let's see if this holds up....


Cheers,
Franco