OPNsense 20.7 on KVM with Virtio adapters - CARP and Suricata IPS (netmap)

Started by Styx13, December 15, 2020, 02:35:16 AM

Previous topic - Next topic
Hello here !

I am new to OPNsense which I discovered this month.

Thank you very much for this nice firewall, I love it so much that I already set it up in HA !

However, I did notice some weird behavior with CARP, and after reading in different place on this forum, on pfSense forum and other BSD forums, it seems there is something going on with CARP + Suricata IPS (and maybe with Virtio ?).

I wanted to start here a conversation regarding issues with HA setup with CARP + IPS (and + Virtio if virtio has anything to do with this).
The versions I ran and observed this issue are:
OPNsense 20.7.5 and 20.7.6 (production)

A bit of information about my setup, which lead me to discover those issues with CARP + IPS (netmap) + Virtio

So have 2 KVM hosts (host OS is Debian buster) and I am running an OPNsense VM on each, using only Virtio adapters configured to use Linux bridges defined on my host.
I have 6 different networks, they are connected on my hosts via 2 physical interfaces: 1 for WAN, and 1 for all my LAN network which are trunked on a single port (VLAN tagging). Then I have multiple VLAN interfaces on my host for the different VLAN tagged networks and on top of those VLAN interfaces I defined Linux bridges and each VM has 6 Virtio network adapter: 1 per linux bridge.
One of the bridge is my WAN (directly on top of the physical interface), 4 are different internal networks (on top of VLAN interfaces) and the 6th one is dedicated to PFSYNC (on top of VLAN interface too)

Everything I setup so far works great, HA works, conf synchronization works, IDS works !

However, when I switch on IPS ... I start seeing some weird behavior with CARP.

Usually the "backup" firewall starts by complaining about "CARP has detected a problem" and starts demoting.
Sometimes (i think when the issue occurs for the first time after a reboot for example), the page to access the VIP status takes forever to load, and when it finally does, I can see like half of my network on the backup firewall are "MASTER" and the other half is "BACKUP", while if I look on the master, they all show "MASTER".
Eventually, both sides will show "CARP has detected a problem" and the only way to fix it is to reboot the VMs. If I leave IPS on, the issue will occur shortly after reboot.
If I disable IPS (but keep IDS), the issue never occurs.

As I started looking online for other people having a similar issue.
First I found first some people mentioning issues with Virtio, but later on I found some discussion indicating that some recent work was done and that currently OPNsense has a pretty good support for Virtio, including for CARP.

Then I read some other posts where people mentioned that they would see issues with OPNsense HA + IPS, and then pointing out specifically at CARP + netmap.

Now where I am not sure is about Virtio: is Virtio still part of the issue ? or is it purely a CARP + netmap issue ?
I saw other people using KVM still having the same problem with e1000 adapters.

For people who are aware of this problem and understand it, please comment on my observations, let me know what is the cause of the issue and if there exist any current fixes ?

And if I can help devs in any way to work on a fix by providing logs, dump or other, please let me know.

Thank you !

I'm not using carp atm, it must be purely carp + netmap

virtio + netmap works since 20.7.x with no issues for me on kvm, before it crashed the vm.


But there are still improvements ongoing.

Do you run suricata on the carp interface by any chance ? Make sure it doesn't block traffic

You could try running development kernel via
# opnsense-update -kr 20.7.6-next
# opnsense-shell reboot

Create a snapshot beforehand and revert if it doesn't fix the issue.

Follow https://forum.opnsense.org/index.php?topic=17363.195

I havent tested this yet, but does this also happen after a reboot when IPS justs starts with the bootup process?

Quote from: Voodoo on December 20, 2020, 04:15:40 AM
Do you run suricata on the carp interface by any chance ? Make sure it doesn't block traffic
Yes I do run suricata on the carp interfaces, and I do not think it blocks carp. even without any rule loaded, the issue happens.

Quote from: mimugmail on December 20, 2020, 07:13:23 AM
I havent tested this yet, but does this also happen after a reboot when IPS justs starts with the bootup process?
As a matter of fact, things seems better after a reboot, but only for a few minutes.
After a few minutes, CARP goes crazy again and start displaying the "CARP has detected a problem" message. Then for some time I see half of my interfaces on master and half on backup, eventually CARP may demote on side more than the other and all interfaces end up on master (or sometimes backup). But the issue is still there and my VIP are not working properly (if I reboot the master, the backup wont take over right away, loss of connection , etc ..).

I could try the dev kernel, but I was hoping maybe a developer would give a hint as to maybe the next kernel would help if they knew something in the next kernel has some stuff done related to carp/netmap. Otherwise, it feels like testing blindly.

I may still try if that helps.

Maybe it's related to nic and/or driver. IPS triggers an interface event and carp seems to detect this as a problem.