Topics - Styx13

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - Styx13

Pages1

25.1, 25.4 Legacy Series / ddclient config file generated buy GUI contains invalid end of lines

April 27, 2025, 11:56:23 PM

Hello,

I found out today after editing the configuration of ddclient via the GUI that it some characters were being added at the end of some lines:

Code Select

syslog=yes                  # log update msgs to syslog
pid=/var/run/ddclient.pid   # record PID in file.
verbose=yes

use=cmd, cmd="/usr/local/opnsense/scripts/ddclient/checkip -t 0 -s freedns --timeout 10", \
protocol=porkbun, \
apikey=my_secret_api_key, \
secretapikey=my_secret_secret_key, \
login=my_secret_api_key, \
password=my_secret_secret_key \
my.fully.qualified.hostname

Those ", \" are effectively making the configuration invalid/confusing to ddclient and causing errors like:

Code Select

WARNING: Could not determine an IP for my.fully.qualified.hostname	
WARNING: my.fully.qualified.hostname: unable to determine IP address with strategy use=cmd	
WARNING: found neither IPv4 nor IPv6 address

After manually removing the extra ", \" and " \" from the configuration file and restarting ddclient, everything works fine again!

OPNSense version: OPNsense 25.1.5_5-amd64

PS: I know that the documentation recommends moving to the "native" client, however, I use porkbun for resolution of some of my domains and the native client does not support it.

25.1, 25.4 Legacy Series / KeaDHCP with HA - HA_LEASE_UPDATE_CONFLICT / LEASE_CMDS_UPDATE4_CONFLICT

April 16, 2025, 03:16:45 AM

Hello,

Similar to a few other post I could find here or in the KEA DHCP mailing-list, I do sometimes get the HA_LEASE_UPDATE_CONFLICT message in the KEA DHCP logs.

Eventually, this leads to KEA DHCP terminating HA (based on the max-rejected-lease-updates (default 10))

I noticed that it usually happens after the primary node gets rebooted (after an update for example) or when I "Enter Persistent CARP Maintenance Mode" on the primary node and then eventually get out of it.

As I was looking at the KEA DHCP configuration files to try and find a clue as to why it may happen, I noticed that the "kea-dhcp4.conf" configuration file content had all it's slashes ('/') escaped => '\/'
I wonder what is the reason for that? I thought from what I read that the only thing that needed to be escaped in the KEA configuration files are the commas (',')
Also looking around on KEA configuration file examples, I did not notice anybody else escaping the slashes in their configuration files.

Other than that, I did no see anything particular that could explain the issue I am facing.

Below the logs on the primary (hot) when the issue happens:

Code Select

2025-04-15T20:51:47-04:00	Warning	kea-dhcp4	WARN [kea-dhcp4.ha-hooks.0x395ec216600] HA_LEASE_UPDATE_CONFLICT OPNsense-primary: lease update [hwtype=1 xx:xx:xx:xx:2b:25], cid=[no info], tid=0x5418305d sent to OPNsense-backup (http://10.99.0.252:8001/) returned conflict status code: ResourceBusy: IP address:10.90.0.54 could not be updated. (error code 4)

and the corresponding log on the backup (standby):

Code Select

2025-04-15T20:51:47-04:00	Warning	kea-dhcp4	WARN [kea-dhcp4.lease-cmds-hooks.0x38dd92616d00] LEASE_CMDS_UPDATE4_CONFLICT lease4-update command failed due to conflict (parameters: { "expire": 1744766507, "force-create": true, "fqdn-fwd": false, "fqdn-rev": false, "hostname": "REDACTED", "hw-address": "xx:xx:xx:xx:2b:25", "ip-address": "10.90.0.54", "origin": "ha-partner", "state": 0, "subnet-id": 6, "valid-lft": 1800 }, reason: ResourceBusy: IP address:10.90.0.54 could not be updated.)

I redacted part of the MAC address and the hostname.

Eventually, after enough of those warning, eventually it leads to termination:
On the primary (hot):

Code Select

2025-04-15T20:51:47-04:00	Error	kea-dhcp4	ERROR [kea-dhcp4.ha-hooks.0x395ec216600] HA_TERMINATED HA OPNsense-primary: service terminated due to an unrecoverable condition. Check previous error message(s), address the problem and restart!	
2025-04-15T20:51:47-04:00	Error	kea-dhcp4	ERROR [kea-dhcp4.ha-hooks.0x395ec216600] HA_LEASE_UPDATE_REJECTS_CAUSED_TERMINATION OPNsense-primary: too many rejected lease updates cause the HA service to terminate

and on the backup (standby):

Code Select

2025-04-15T20:51:51-04:00	Error	kea-dhcp4	ERROR [kea-dhcp4.ha-hooks.0x38dd92615f00] HA_TERMINATED HA OPNsense-backup: service terminated due to an unrecoverable condition. Check previous error message(s), address the problem and restart!

Running OPNsense 25.1.5_5-amd64 at the time of writing

24.7, 24.10 Legacy Series / No logs for Kea DHCP when using /var/log RAM disk

September 23, 2024, 03:43:53 AM

Hello,

OPNsense 24.7.3_1

When I enable /var/log RAM disk (Use memory file system for /var/log) and reboot, I cannot see the Kea DHCP logs anymore.
I noticed the following error message in the backend log:

Code Select

2024-09-22T21:40:28-04:00
Error
configd.py
[c6d73318-a7ab-448f-b5dd-926982c4c82d] Script action failed with Command '/usr/local/opnsense/scripts/syslog/queryLog.py --limit '500' --offset '0' --filter '' --module 'core' --filename 'kea' --severity 'Emergency,Alert,Critical,Error,Warning,Notice,Informational' --valid_from '1726969229.675'' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 76, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/syslog/queryLog.py --limit '500' --offset '0' --filter '' --module 'core' --filename 'kea' --severity 'Emergency,Alert,Critical,Error,Warning,Notice,Informational' --valid_from '1726969229.675'' returned non-zero exit status 1.

High availability / [SOLVED] 24.7 PFSync State synchronization not working

August 11, 2024, 09:31:29 PM

Hello,

I have OPNsense configured with HA, CARP works fine, no issues with it.
However, PFsync seems to not work properly as when switch to backup (or back to primary), all my current established connections die. (I did the test with an ssh connections to a host behind both firewalls and it hangs and then reset when the CARP switch PRIMARY=>BACKUP happens).

This is not my first HA setup, I have been running OPNsense with HA for 4 years and it has been working very well (both CARP and PFSync with seemless transition to backup without losing any connection).

So maybe I did something wrong here in this new setup I did and I may need another pair of eyes to look at my setup and figure out what is wrong.

For 24.7, I did a fresh install.

Both the primary and backup are VMs, just like my previous setup with 24.1 was (and that previous setup was working fine for year, started with 20.7 on it all the way to 24.1 upgrades).

(important notes: my previous 24.1 setup is not running anymore I shutdown and now deleted those VMs, so only the new setup exists)

One difference with my new system, is that the VM for the primary is using PCI Passthru for the 10Gb port (LAN - ix0) and the 1Gb port (igb0 WAN).

On the backup VM, it is using Virtio adapter for both (vtnet0 & vtnet1).

So on both sides, I created failover LAGG interfaces (with a single port in each) and configured lagg0 for LAN and lagg1 for WAN in order to have the interface name match on both side as it is important for state syncing as indicated in the doc.

Then on top of the LAN LAGG interface (lagg0) I created a bunch of VLANs as this port is a trunk port with several tagged VLANs.
That part of the setup (VLANs) is identical to my previous on (with 24.1) where all my networks are connected to the firewall via a single port and tagged VLANs.

So I end up with multiple lagg0_vlanXX vlan interface which are assigned and I made sure that on both sides (primary and backup) the optXX matches. (for example, on both sides, lagg0_vlan10 is opt1, lagg0_vlan20 is opt2, etc ..).

I have a dedicated VLAN for PFSYNC (VLAN99 - assigned to opt7 on both sides) which is also used by KEA DHCP for peer traffic.
On the primary that interface is configured with IP 10.90.0.251/24
On the backup that interface is configured with IP 10.90.0.252/24

The firewall rules for the PFSYNC interface are:

Code Select


     Protocol     Source                     Port  Destination    Port         Gateway  Schedule   Description  
pass IPv4 PFSYNC  VLAN99_PFSYNC net          *     This Firewall  *            *        *          Allow pfSync traffic  
pass IPv4 TCP     VLAN99_PFSYNC net          *     This Firewall  443 (HTTPS)  *        *          Allow HTTPS traffic for config synchronization  
pass IPv4 TCP     VLAN99_PFSYNC net          *     This Firewall  8001         *        *          Allow Kea DHCP HA Peer traffic

System: High Availability: Settings - On the primary node:

Synchronize States:	checked
Synchronize Interface:	VLAN99_PFSYNC
Sync Compatiliby:	OPNsense 24.7 or above
Synchronize Peer IP:	10.99.0.252
Synchronize Config:	10.99.0.252
Remote System Username:	<the username of my backup node>
Remote System Password:	<the password of my backup node>
Services to synchronize (XMLRPC Sync):	Aliases, Certificates, Dashboard, Firewall Categories, Firewall Groups, Firewall Log Templates, Firewall Rules, Firewall Schedules, IPsec, Kea DHCP, NAT, Network Time, Unbound DNS, Virtual IPS

System: High Availability: Settings - On the secondary node:

Synchronize States:	checked
Synchronize Interface:	VLAN99_PFSYNC
Sync Compatiliby:	OPNsense 24.7 or above
Synchronize Peer IP:	10.99.0.251

(fields that are not indicated are either empty or default value)

System: High Availability: Status - On the primary node:
<showing the backup firewall version and services, all green, and synchronization of configuration works fine>

System: High Availability: Status - On the backup node:
The backup firewall is not accessible or not configured.

When I look at the Firewall: Diagnostics: States on both nodes, I can see a "similar" number of states: ~1700 on primary and ~1500 on backup.

But if I switch from Primary to backup (by enabling Persistent Carp Maintenance Mode on the primary), then any established connections (like ssh) hang and die. but also when I compare the states in Firewall: Diagnostics: States on both nodes, then the primary node shows ~500 states and the backup shows ~2200 states.

So something must be wrong somewhere, but I cannot figure out what. Is there a log/place where I can see more details about PFSync activity? and make sure it is working as expected?
Let me know if you need more information.

Thank you

Virtual private networks / IPSec roadwarrior multiple connection with different certificate each

August 06, 2024, 03:21:22 AM

Hello,

I was able to successfully configure IPSec roadwarrior using EAP-MSCHAPv2 + Certificate (using the new connections (swanctl.conf)).
I just followed the instruction from the wiki for EAP-MSCHAPv2 and then I added another round (round 0) of remote authentication using Public Key before the EAP-MSCHAPv2 one (round 1) and that was it.

But then, I wanted to add more certificates for multiple users to connect, so I created certificates for all my users and added them in the Public Key authentication round (as it allows to select more than 1 certificate - see screenshot attached).

However, I noticed that only 1 of the client could connect, the others cannot.
The other clients get a "no matching peer config found" error:

Code Select


2024-08-05T21:16:17-04:00	Informational	charon	10[CFG] <19> no matching peer config found

It turns out that the client that can connect correspond to the client that was selected first in the list.

I tried by selecting them in a different order and then another client could connect but none of the other.

So I am not sure how this Certificates field really works, but it seems that only the first certificate in the list is used.

I was reading the swanctl.conf doc and the description is

Quotecerts: Comma separated list of certificates to accept for authentication. The certificates may use a relative path from the swanctl/x509 directory or an absolute path

I looked at my generated swanctl.conf and that section looks as follow:

Code Select


        remote-8ccbba89-c628-4ea0-a7ee-15fa7e0d71c2 {
            round = 0
            auth = pubkey
            certs = 66ad6e885fe21.crt,66b16e44c13bc.crt,66aff2593ebc7.crt,66ae72bb9bd73.crt
        }

So all 4 certificates are in the list .. but only the first one seems to work.
And in deed if I select them in a different order, the first one changes and another client can connect but not the others.
So somehow, the list does not seem to work and it seems to only check against the first one.

Is this a swanctl bug? or am I misconfiguring something?

Intrusion Detection and Prevention / Suricata Policies not working as expected?

February 23, 2023, 04:39:07 AM

Hello,

Running OPNSense 23.1.1_2 with Suricata enabled as IPS.

I wanted to update which rules are enabled and drop/alert and decided to cleanup all my policies, rule adjustments and enabled rulesets and start back from scratch.

I then enabled the following rulesets:

abuse.ch/Feodo Tracker
abuse.ch/SSL Fingerprint Blacklist
abuse.ch/SSL IP Blacklist
abuse.ch/ThreatFox
abuse.ch/URLhaus
ET open/drop
ET open/dshield
ET open/emerging-malware
ET open/emerging-mobile_malware

I then went and created a first policy that I called "Disable all" which, as its name indicates, disables all rules ("Nothing Selected" everywhere and New Action = Disable).
I enabled it and applied and then went to check that all rules were in deed disabled.

Then I disabled that "Disable all" rule and created a new one called "Specific Ruleset all rules drop".
In the "Specific Ruleset all rules drop" I selected the following rulesets:

abuse.ch/Feodo Tracker
abuse.ch/SSL Fingerprint Blacklist
abuse.ch/SSL IP Blacklist
abuse.ch/ThreatFox
abuse.ch/URLhaus
ET open/drop
ET open/dshield

Left all the other selection fields to "Nothing selected" and set New Action to "Drop". My goal being to go and enable all the rules for those selected rulesets and set the action to drop.

I made sure that policy "Specific Rulesets all rules drop" was the only one enabled and clicked "Apply"
But then, when I go and check the rule list, the first thing I observe is that a lot of rules are enabled, but on alert (instead of drop).
Also I can see some (but not all) of the rules from the rulesets I did not select (ET open/emerging-malware and ET open/emerging-mobile_malware) are enabled and set to alert as well, when they should have remained disabled.

I initially created both policies with priority 0 (and as described above, I was making sure I only enable one at a time when I click "apply"), and then I tried them again by assigning different priorities to them (and still making sure only one is enable when I hit "apply"), but that did not make a difference.

I did not remember running in this problem back in OPNsense 22.x

Am I doing something wrong here? or could something have changed in OPNsense 23.x ?

22.1 Legacy Series / Changing Monit LIMITS

May 05, 2022, 05:53:22 PM

Hello,

I am using Monit to check on various things, including the VPN connections to my system.
The script I use to check the VPN connection outputs the list of connections and all that get sent to me via email when changes occurs (so I know if something unexpected happens on the VPN side).
However, I noticed that the output I receive in the email is truncated.
Also, checking the monit status (both in the web interface of OPNsense and command line) , the output is trunctated as well.

I found out the reason: monit has a default limit of 512b for program outputs.
This limit can be changed (https://mmonit.com/monit/documentation/monit.html#LIMITS). However if I go and change it in the monitrc config file, it will get overwritten by OPNsense next time.

What is the proper way to set those monit LIMITS with OPNsense ? Is there a way to add a custom monit config file that can get appended or prepended to the generated monitrc ?

Thank you

22.1 Legacy Series / WAN VIP failover to secondary during DHCP renewal

April 07, 2022, 05:09:07 PM

Hello,

Since OPNsense 22.1.3 or 22.1.4 (I updated directly from 22.1.2 to 22.1.4), I have a strange behavior regarding CARP failover and my WAN interface DHCP renewal.

My OPNSense HA setup consist in 2 OPNsense instances on 2 different systems which are both connected to the same router for their WAN interface.
The router assigns each OPNsense an IP via DHCP and renews it every 24 hours (the DHCP configs is "static" in the sense that the MAC address of each OPNsense is assigned an IP in the DHCP server of the router - but from OPNSense point of view, it's DHCP served).

This has been working like that for years and no issues.

But lately (after the update from 22.1.2 to 22.1.4), every time the WAN DHCP address renews on my primary node, CARP would failover the WAN VIP to the secondary node (and just the WAN VIP, the other VIPs for my other VLANs all stay on the primary) and it stays like that, it never fails back to the primary.

While in that state (WAN VIP on secondary and all other VIPS on primary), several things are not working properly including some VPN connections I have and overall I notice some weird/slow DNS resolution and other slowness.

The only way I found to put back the WAN VIP on the primary is to go on both the primary and secondary, disable carp and re-enable it (sometimes I need to set carp persistent mode on secondary to force it back to primary).

Before 22.1.3, I never observed this behavior (and if it happened, it was probably very short and failed back right away so I never noticed it ?)

I noticed in the release notes of 22.1.3 the follow changes:
- interfaces: do not update VIPs on dynamic address changes
- interfaces: remove unused reference and return value from interface_carp_configure()
- dhcp: stream-read log and leases files for "dhcpd update prefixes" action
- ports: dpinger 3.2 [3]

Could any of those changes be related to the behavior I am seeing ?

Thank you.

21.7 Legacy Series / IPSEC VPN Mutual RSA with P12 certificates

August 03, 2021, 02:43:27 AM

Hello,

With the recent change in the way 21.7 handles the RSA certificate by using the new identity parsing with the ":" (https://wiki.strongswan.org/projects/strongswan/wiki/IdentityParsing) I ran into some issues.

I have another strongswan instance running on a Linux server (not OPNSense), and on that remote instance, I have strongswan configured to use certificate in p12 format (which is supported as indicated here: https://wiki.strongswan.org/projects/strongswan/wiki/P12Secret

However, strongswan is a bit difficult on how the leftid / rightid need to be filled in order for it to properly find the private key in the p12 certificate.
I found out that the best way to find out the private key in the p12 certificate to use is to use the asn1dn for rightid/leftid.

However, to use it properly, double quotes need to be put in place, and if they are not put exactly like strongswan expects it .. then it wont find the private key to use in the p12 certificate.

For it to find it, the proper syntax is to have the whole "asn1dn:#307e310b30..." in between double quotes.
So this does not work : asn1dn:"#307e310b30..."

And unfortunately, in version 21.7, it automatically writes the asn1dn: for us when we select it in the dropdown with no possibility to add the double quotes before.
In previous version (21.1 and before) it did not add the asn1dn: so it was easy to just go and put in the input field the whole "asn1dn:#307e310b30..." and that would work.
But now, putting the whole "asn1dn:#307e310b30..." in the field results in asn1dn"asn1dn:#307e310b30..." in the configuration file which is not working of course.
So all this results in the IPSEC on OPNSense never finding a proper match (because of the way it generates the input in the config)

So my request would be to add in the dropdown a "raw" or "custom" option which just let the user input exactly what he wants and not generate anything around it. That would solve a lot of those issues.

So far, the only way I got it to work on 21.7 is to go and manually edit the ipsec.conf file to put in the way it expects it, but of course this is not viable as it will get overwritten.

So again, just adding in the dropdown an option for the end user to put in exactly what they want and it gets in the config file as-is without any modification or massaging.

Thank you !

#10

Zenarmor (Sensei) / Sensei - questions on reporting and status

January 11, 2021, 09:38:24 PM

Hello,

I am testing Sensei (1.6.2) on my OPNsense (20.7.7_1) setup and I have a few questions regarding reporting and status.

In both the dashboard and the report sections, I do not understand the top local host and top remote host widgets.

I would have thought the top local host should only contain hosts/devices from my local networks and the top remote host should only contain hosts from internet .
However, both of them contains IP addresses (or hostname) from my local devices and from internet.
i.e. the top 10 local devices shows currently 3 IPs from the internet and 7 from my local networks
the top 10 remote devices shows currently 4 IPs from the internet and 6 from my local networks

Is that expected ? and if so how should that be interpreted ?

Another question on the status page. I noticed that for all my interfaces, the "Bytes OUT" and "Packets OUT" column are at 0 and seems to never change. While the Bytes IN and Packets IN are showing some values and increase over time.
Why is there not Bytes Out or Packets OUT information ?

Finally, for the scheduled report, it seems the email I receive always indicates as part of the quick facts: Connections: 10,000.
Why is it always 10,000 ? What does this represent ?
Also I noticed in the quick fact: Unique Local hosts: 91.
On my networks I currently have < 30 devices (including VMs and containers), where does the 91 come from ?

Thank you !

BTW, forgot to mention, I am using external elasticsearch database (elasticsearch 7.10.1) and my OPNSense instance has 4GB of memory. (it was using 20 - 25% of that before installing sensei, since running sensei memory utilization is at 30-33%)

#11

20.7 Legacy Series / Monit false alert due to incorrect evaluation of traffic

January 08, 2021, 01:37:01 AM

Hello,

I put in place a few monit alerts in order to try and detect excessive upload from some of my networks. (trying to figure out if there could be data leakage or suspicious transfer of data to internet).

As we know, upload from a given network (in my case VLAN20) is actually download on the corresponding network interface on the firewall.
So I created a new "Service Test Setting" that I called 'Download2GBin1H" in the monit GUI to detect trigger if there is more than 2GB of data downloaded in the last hour as follow:

I then defined a Service Setting called "suspicious_upload_vlan20" as follow:

I also checked the actual monit configuration put in place in /usr/local/etc/monitrc, and here is the corresponding entry:

Code Select

check network suspicious_upload_vlan20  interface vtnet2
   if total download > 2 GB in the last 1 hour then alert

I expect from that to receive an email alert when the hosts on VLAN 20 upload 2GB or more of data within 1 hour of time.

And it does seem to work: I ran some test, and uploaded some data on purpose to test and I did receive an alert email as I uploaded more than 2GB within an hour.

However, I do also receive alert email sometimes and when I check, I do not see nearly enough upload that occured within the last hour that would amount to more than 2GB.

As I started suspecting something is wrong, I ran a quick script to just keep an eye on the network interface that serves VLAN20 on my OPNSense firewall/gateway (vtnet2 is the network interface serving VLAN20, the 8th field in the netstat -I vtnet2 -b output is the Ibytes (bytes in = number of byte received from the network by the interface)):

Code Select

# while true                                                                                                            
do
date;netstat -I vtnet2 -b | awk '/Link/{print "Uploaded by VLAN20: "$8/1024/1024 " MB"}'
sleep 600                                                                                                               
done
Thu Jan  7 11:04:39 EST 2021
Uploaded by VLAN20: 24254.7 MB
Thu Jan  7 11:14:39 EST 2021
Uploaded by VLAN20: 24255.1 MB
Thu Jan  7 11:24:39 EST 2021
Uploaded by VLAN20: 24255.5 MB
Thu Jan  7 11:34:39 EST 2021
Uploaded by VLAN20: 24256 MB
Thu Jan  7 11:44:39 EST 2021
Uploaded by VLAN20: 24256.5 MB
Thu Jan  7 11:54:39 EST 2021
Uploaded by VLAN20: 24257 MB
Thu Jan  7 12:04:39 EST 2021
Uploaded by VLAN20: 24257.4 MB
Thu Jan  7 12:14:39 EST 2021
Uploaded by VLAN20: 24259.3 MB
Thu Jan  7 12:24:39 EST 2021
Uploaded by VLAN20: 24277.6 MB
Thu Jan  7 12:34:39 EST 2021
Uploaded by VLAN20: 24296.4 MB
Thu Jan  7 12:44:39 EST 2021
Uploaded by VLAN20: 24313.5 MB

As you can see, from 11:04 am till 12:44pm, less than 60MB was uploaded.

But I still received this email at 12:11pm:

QuoteDownload bytes exceeded Service suspicious_upload_vlan20

Date: Thu, 07 Jan 2021 12:11:58
Action: alert
Host: OPNsense-primary.localdomain
Description: total download 4.6 GB matches limit [download rate > 2 GB in last 1 hour]

Your faithful employee,
Monit

This scenario has occurred several times since I put in place those rules, but this is the first time I look at it more closely and grab some actual data from the interface itself to verify that in deed: what monit is reporting is not true.

This looks like it could be a bug in monit maybe ? Anybody encountered the same issue ? Does anybody knows where monit grab its statistics ?

Thank you

#12

20.7 Legacy Series / Aliases Statistics

January 03, 2021, 02:15:42 AM

Hello,

I noticed when I added some aliases today that you can check a "statistics" checkbox upon alias creation.

I did that for a few, and now I am wondering where I can find the statistics for those aliases ?

The Aliases page does not have an "Inspect" button like the Rules pages does.

The only place I noticed there could be some statistics related to the Aliases is on the Diagnostics > pfTables page where I can select my aliases there. It displays a table with packets/bytes in/out pass/block, however it's all empty, there are no numbers in any of those columns, even for the Aliases where I checked the Statistics box.

If somebody could shed some light as to how this "Statistics" checkbox for Aliases works and where I should look ?

Thank you !

#13

20.7 Legacy Series / OPNsense 20.7 on KVM with Virtio adapters - CARP and Suricata IPS (netmap)

December 15, 2020, 02:35:16 AM

Hello here !

I am new to OPNsense which I discovered this month.

Thank you very much for this nice firewall, I love it so much that I already set it up in HA !

However, I did notice some weird behavior with CARP, and after reading in different place on this forum, on pfSense forum and other BSD forums, it seems there is something going on with CARP + Suricata IPS (and maybe with Virtio ?).

I wanted to start here a conversation regarding issues with HA setup with CARP + IPS (and + Virtio if virtio has anything to do with this).
The versions I ran and observed this issue are:
OPNsense 20.7.5 and 20.7.6 (production)

A bit of information about my setup, which lead me to discover those issues with CARP + IPS (netmap) + Virtio

So have 2 KVM hosts (host OS is Debian buster) and I am running an OPNsense VM on each, using only Virtio adapters configured to use Linux bridges defined on my host.
I have 6 different networks, they are connected on my hosts via 2 physical interfaces: 1 for WAN, and 1 for all my LAN network which are trunked on a single port (VLAN tagging). Then I have multiple VLAN interfaces on my host for the different VLAN tagged networks and on top of those VLAN interfaces I defined Linux bridges and each VM has 6 Virtio network adapter: 1 per linux bridge.
One of the bridge is my WAN (directly on top of the physical interface), 4 are different internal networks (on top of VLAN interfaces) and the 6th one is dedicated to PFSYNC (on top of VLAN interface too)

Everything I setup so far works great, HA works, conf synchronization works, IDS works !

However, when I switch on IPS ... I start seeing some weird behavior with CARP.

Usually the "backup" firewall starts by complaining about "CARP has detected a problem" and starts demoting.
Sometimes (i think when the issue occurs for the first time after a reboot for example), the page to access the VIP status takes forever to load, and when it finally does, I can see like half of my network on the backup firewall are "MASTER" and the other half is "BACKUP", while if I look on the master, they all show "MASTER".
Eventually, both sides will show "CARP has detected a problem" and the only way to fix it is to reboot the VMs. If I leave IPS on, the issue will occur shortly after reboot.
If I disable IPS (but keep IDS), the issue never occurs.

As I started looking online for other people having a similar issue.
First I found first some people mentioning issues with Virtio, but later on I found some discussion indicating that some recent work was done and that currently OPNsense has a pretty good support for Virtio, including for CARP.

Then I read some other posts where people mentioned that they would see issues with OPNsense HA + IPS, and then pointing out specifically at CARP + netmap.

Now where I am not sure is about Virtio: is Virtio still part of the issue ? or is it purely a CARP + netmap issue ?
I saw other people using KVM still having the same problem with e1000 adapters.

For people who are aware of this problem and understand it, please comment on my observations, let me know what is the cause of the issue and if there exist any current fixes ?

And if I can help devs in any way to work on a fix by providing logs, dump or other, please let me know.

Thank you !

Pages1