Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - nzkiwi68

#1
Renamed a "Network(s)" alias, didn't add or subtract any content, just renamed the alias.

Firewall rules in multiple interfaces using the alias failed to update to the new alias name thereby breaking the firewall rules.
Renamed a second alias and observed the same behavior.
#2
I am running OPNsense on an ODroid-H4 and it works great. The H4 BIOS supports the The Intel TCO (Total Cost of Ownership) Watchdog Timer.

What is the Intel TCO Watchdog Timer
This is a hardware watchdog in the chipset and BIOS of the computer whose purpose is to reboot the computer if the system hangs. If the hardware watchdog does not receive a ping at a regular interval it will cause a hardware reset. To work, the software must run a software Watchdog daemon which regularly writes to the ACPI hardware tables. In this way, if the OS hangs then there is no update to the tables, the hardware watchdog sees this and reboots the computer.

I see there is support in FreeBSD for the Intel TCO Watchdog Timer, the "ichwd" driver.

The Intel TCO Watchdog Timer is not just ODROID-H4 specific, but many Intel based systems support this.


Feature Request
Add a package that enables support for the Intel TCO Watchdog Timer.

I image it would have a few simple settings that could be set:
  • watchdog-timeout = 14
  • realtime = yes

Plus the ability to:
  • View watchdog logs


Reference
ODroid-H4 Intel TCO Watchdog on Linux/Ubuntu
FreeBSD - ichwd --   device driver for the Intel ICH   watchdog interrupt timer


#3
Firstly, let me credit Brett Merrick for huge assistance. This is not all my own work.

Here's the steps to get GeoIP working inside HAPROXY, not at the firewall rule layer, but inside HAPROXY and still utilising OPNsense GeoIP alias function.

You can write conditions such as:
Condition: Paths starts with /login/
Condition: GeoIP matches Australia

Then write a rule that does things like:
Rule: Only permit login from Australia (Permit http_request if matches "Paths starts with /login/" and "GeoIP matches AU"

Very cool! Any many, many more possibilities for protection, reject excessive error rates or connection rates from certain countries, use Tarpitting on some countries and not others, and so on.

Ok - how:


*****************************************************
ONE: These two files need to get added to the system
*****************************************************

File1
filename: actions_custom.conf
location: /usr/local/opnsense/service/conf/actions.d/

[update]
command:/usr/local/opnsense/scripts/custom/haproxy-alias.sh
parameters:%s
type:script
message:Updating HAProxy Alias %s
description:Update HAProxy Alias


File2
filename: haproxy-alias.sh
location: /usr/local/opnsense/scripts/custom/

#!/bin/csh
if ( $#argv == 0 ) exit 1
configctl filter list table "$1" > "/var/haproxy/$1.lst"
chown 80:80 "/var/haproxy/$1.lst"
exit 0



************************************************************
TWO: Build the GeoIP alias "acl_geoip_au" (for say Australian IP addresses
************************************************************
If you haven't setup for the GeoIP downloads to get the GeoIP databnase list, then follow the OPNsense documentation first:
https://docs.opnsense.org/manual/how-tos/maxmind_geo_ip.html
Normal firewall GeoIP settings
Firewall > Aliases > GeoIP settings

*** make sure to only use ipv4 if you don't have ipv6 ***


*******************************************
THREE: Create the cron job and run it ONCE
*******************************************
Then set to run overnight after midnight sometime ideally running after GeoIP DB update and before HAPROXY reload. You will need an HAPROXY reload to pickup the new GeoIP tables.


************************
FOUR: Now setup a condition
************************
e.g.
Name: GEOIP_AU
Condition type: Source IP matches specified IP (from the drop down list)
Parameters: -f /var/haproxy/acl_geoip_au.lst

(see the acl_geoip_au.lst - that needs to match the firewall GeoIP alias name. In my example the alias name is acl_geoip_au)


*****************************************************************************
FIVE: Go make a rule that uses your condition and attach it a backend or frontend as appropriate
*****************************************************************************

Want more GeoIP ranges?
Start at step two and rinse and repeat for more aliases with different country combinations.
Each aliases needs a CRON job and don't forget you need to run the CRON job once to get the alias ready for HAPROXY to use.


#4
Am I missing something?

VPN > WireGuard > Settings > Endpoint
You can specify a "Shared Secret"

On the remote site, where this Endpoint connects to:
VPN > WireGuard > Settings > Local/b]
I cannot see any way to add the "Shared Secret"

Or am I missing something?
#5
I have had a really good go at trying to figure out the logic of when and how but I wasn't able to deteremine what's going on. What I can say is on 90% of the firewalls, the "WireGuard (Group) firewall rule group is missing despite rebooting, stopping and starting WireGuard, etc.

This is happening across many different firewalls, different hardware, Hyper-V based VMs, clustered and not clustered. I realised the problem when I had one way traffic, because the remote end had allow firewall rules only on the "WireGuard (Group)" that disappeared so blocked all inbound wg tunnel traffic.

What happens
If you're quick enough in the GUI, you see the "WireGuard (Group)" firewall rule group appear and then after a while, disappear.

Work around
1. Add every wq interface inside:
VPN > WireGuard > Settings > Local
(e.g. wg1, wg2, wg3...)

2. Assign these as an interface in:
Interfaces > Assignments

3. Then create allow firewall rules on these individual firewall interfaces

4. If you are running clustered firewalls
You need to start WireGuard on the backup firewall to be able to also add the interfaces to the backup firewall.





#6
Have a master "Enable OpenVPN" that writes the 20-openvpn script into /usr/local/etc/rc.syshook.d/carp/ directory, otherwise the script does not exist.

If you unselect "OpenVPN" then the CARP script /usr/local/etc/rc.syshook.d/carp/20-openvpn is removed.

Why?
Because the script gets called and runs again and again and again for CARP events of INIT, BACKUP and MASTER events even when your are not running OpenVPN.
#7
With Wireguard now baked into the core with 23.7.3, my Wireguard custom CARP script broke. I enlisted the help of a friend and together we built a new Wireguard CARP fail-over script.

It works absolutely brilliantly and it's ready for production.
I note that now as the CARP script gets called multiple times as multiple VHID's transition, but it no longer affects Wireguard forcing it to start and stop and often break as was the case. As this new script is called multiple times during a CARP transition from backup master, it starts up Wireguard once and remains stable, each time it is called again it doesn't restart Wireguard which is very good.

Add Wireguard CARP awareness to the GUI and follow a single interface
It would be very nice though, if the GUI had an option "Enable CARP Fail-over" (like FRR)
but!
Also have a drop down where you select a single interface to follow

Why?

  • Because despite it being "impossible" over the many years of pfSense and now OPNsense experience I have seen many instances where CARP is misaligned between the backup and the primary firewall.
  • Also, I would almost always follow the LAN interface, where the Wireguard VPN tunnels exit to and begin from and that's really the only CARP interface that we would want to start or stop Wireguard to follow.
  • The new follow a single interface CARP script will exit quickly if the CARP event is not for the required interface, less work for the firewall to process.

I know I have seen many posts here and on GitHub requesting CARP for Wireguard (including me) and questions raised as to why. I thought I would summarise why Wireguard needs to follow CARP.

Why Wireguard with HA needs to stop and start and follow CARP

  • Wireguard doesn't bind to VHID's, it binds to all interfaces (like the WAN firewall interface) and therefore on a CARP fail-over, the backup firewall Wireguard keeps running and interfering with the primary firewall.
  • The above statement is especially true with Wireguard keepalives
  • Because of the above two points, you need to have under HA sync Wireguard "unselected" to keep Wireguard "off" on the backup firewall but then changes to the Wireguard config are not sync'd anymore.
  • This enables FRR to be left on, even on the backup firewall, no need for dynamic routing to be off on the backup firewall (because with Wireguard off, no traffic can pass) and thus on fail-over everything comes up super fast, it's very good.

I think Wireguard is now "prime time ready".


Here is my script.
I fully acknowledge that this is not my exclusive work, but follows the built-in CARP scripts.

Place this script here:
/usr/local/etc/rc.syshook.d/carp
Ensure the Rights are execute
Call the script "10-wireguard"

----- Script Start -----
#!/usr/local/bin/php
<?php

require_once("config.inc");
require_once("util.inc");

$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';

if (!in_array($type, ['MASTER', 'BACKUP'])) {
      log_msg("Carp '$type' event unknown from source '{$subsystem}'");
      exit;
   }

if (!strstr($subsystem, '@')) {
        log_msg("Carp '$type' event triggered from wrong source '{$subsystem}'");
        exit;
    }

    switch ($type) {
        case 'MASTER':
         $config['OPNsense']['wireguard']['general']['enabled'] = '1';
         write_config("Enable WireGuard on this peer due to CARP event", false);
         log_msg("Starting WireGuard due to CARP event '$type'");
            break;
        case 'BACKUP':
         $config['OPNsense']['wireguard']['general']['enabled'] = '0';
         write_config("Disable WireGuard on this peer due to CARP event", false);
         log_msg("Stopping WireGuard due to CARP event '$type'");
            break;
    }

use OPNsense\Core\Backend;
$backend = new Backend();
$backend->configdRun('template reload OPNsense/Wireguard');
$backend->configdpRun('wireguard configure');
----- Script Stop -----

(I couldn't get the script with "code" selected to look nicely, it keep adding lots of junk to the script...)

Please note, my script above doesn't follow a single interface like LAN, but, it does work!

I can create this on Github if appropriate. Comments please.
#8
I've been running WG client on 443 UDP for quite some time. I have the OPNsense GUI moved to another port and HAPROXY running, but listening on 127.0.0.1:44443 with a NAT port forward to TCP 443 to the localhost 127.0.0.1:44443 so 443 UDP is definitely free.

This has been running fine, for quite some time. I use 443 UDP because some places lock down outbound traffic and since the introduction of HTTPS over UDP, I find WG often works and a traditional WG port of say 51820 does not.

Anyway, all working great on an iPad, iPhone and Win 11 laptop. I don't use the client WG VPN much, but I just noticed if I connect, I get exactly 92 B received and nothing works.

Rebooted OPNsense, start and stop WG, try different client (iPad, iPhone and PC) but it just doesn't work. Move WG on OPNsense to UDP 1194 (I know, that's really oVPN, but I'm not running oVPN), move the client port to 1194 and voilia, it all works again.

Something changed upgrading from 23.7 to 23.7.1_3 that broke WG listening on 443 UDP.

I'm waiting until 23.7.3 before I upgrade further, unless 23.7.2 is known to fix this.

*** Update! ****
I changed the GUI to only listen on the LAN interface

System > Settings > Administration > Listen Interfaces > LAN

And now I can have WG working on 443 UDP again.

It looks like 23.7.1_3 binds the GUI somehow to 443 UDP... not to sure how that works....
#9
I read about stability issues with HAPROXY and RSS but does anyone have any comments if this now works?
#10
We need to allow direct access bypassing our proxy, so, I created an Alias:

Alias name: exch_online_hosts
Type: Host(s)
Content: autodiscover.companyXYZ.co.nz outlook.office365.com outlook.office.com

Across a number of OPNsense firewalls

  • some made the alias with 0 loaded IP addresses
  • some made the alias with 8 loaded IP addresses
  • most made the alias with 16 loaded IP addresses
  • others made the alias with 28 loaded IP addresses

On those installations that made the alias with 0 or 8 entries, I manually ran the CLI command:

/usr/local/opnsense/scripts/filter/update_tables.py

It returned Status "ok"

Alias now has 45 loaded entries!

Alias Host(s) type appears to have trouble with a Host alias that resolves to multiple additional names and then walking down through these and resolving those too, but, manually updating the tables from the CLI seems to work.


#11
Something has changed in 23.1.9 with pfsync Synchronize States and systems that were moderately stable now have significant errors.

------------

Retail customer, multi WAN fail-over, multi site, all with HA firewalls, running WireGuard VPN's running FRR and BGP, hub and spoke, all going back to central head office.

Approx. 40,000 transactions weekly

  • Before 23.1.9, about 4-5 POS transactions a week would error
  • Post 23.1.9 upgrade from 23.1.8 - 10+ POS transactions per day were getting broken
  • 23.1.9 disabling System: High Availability: Settings: Synchronize States - now 0 transactions per week being lost

Client - running "POS software", telnet client " POS bank" software
Server - running backend software, client connects to this server via telnet

Check out operator on client:

  • Client start checkout sale via telnet on server
  • Server writes a file into a directory on the server
  • Client POS software scans remote server directory over NetBIOS, sees file, reads file, starts POS bank
  • Client POS bank completes bank transaction with customer credit card etc, writes POS bank answer file in same directory on remote server over NetBIOS
  • Client POS software scans remote server directory over NetBIOS for POS bank answer file, reads file, tell server over telnet sale payment success or failure, sale completed

Error condition happens when sales fails to complete in 45 seconds.
But, what is actually happening, is sale is completed, checkout operator sees successful POS payment and client see POS terminal says payment success but somehow I believe state is lost and client POS software never reads the POS bank answer fille or the POS bank answer file never gets written and so sale hangs with error condition.

What is super interesting is by turning off pfsync Synchronize States, stability is restored.

Obviously this is less desirable in the long term as a firewall HA failover will disconnect all tills and any transactions in progress will be badly affected.
#12
May we have a URL link or notes as to the HAPROXY v2.6.11 to v2.6.12 changes please?

Probably very minor, but, I always try and read the release notes.


Thanks.
#13
Immediately after upgrade to 23.1.5_2, FRR began rebooting and routing unstable.

Diagnosis is something is broken if you have "Enable CARP Failover" selected in the config.

]Routing > Diagnostics > Log

023-03-30T18:09:42 Error bgpd [EC 100663299] buffer_flush_available: write error on fd 2: Bad file descriptor
2023-03-30T18:09:42 Error bgpd [EC 100663299] buffer_flush_available: write error on fd 2: Bad file descriptor
2023-03-30T18:09:31 Warning zebra [EC 4043309122] Client 'bfd' encountered an error and is shutting down.
2023-03-30T18:09:31 Warning zebra [EC 4043309122] Client 'bgp' encountered an error and is shutting down.
2023-03-30T18:09:31 Warning zebra [EC 4043309122] Client 'vnc' encountered an error and is shutting down.
2023-03-30T18:09:22 Error bgpd [EC 100663299] buffer_flush_available: write error on fd 2: Bad file descriptor
2023-03-30T18:09:22 Error bgpd [EC 100663299] buffer_flush_available: write error on fd 2: Bad file descriptor
2023-03-30T18:09:21 Warning zebra [EC 4043309122] Client 'bfd' encountered an error and is shutting down.
2023-03-30T18:09:21 Warning zebra [EC 4043309122] Client 'bgp' encountered an error and is shutting down.
2023-03-30T18:09:21 Warning zebra [EC 4043309122] Client 'vnc' encountered an error and is shutting down.
2023-03-30T18:07:58 Error bgpd [EC 100663299] buffer_flush_available: write error on fd 2: Bad file descriptor
2023-03-30T18:07:58 Error bgpd [EC 100663299] buffer_flush_available: write error on fd 2: Bad file descriptor
2023-03-30T18:07:57 Warning zebra [EC 4043309122] Client 'bfd' encountered an error and is shutting down.
2023-03-30T18:07:57 Warning zebra [EC 4043309122] Client 'bgp' encountered an error and is shutting down.
2023-03-30T18:07:57 Warning zebra [EC 4043309122] Client 'vnc' encountered an error and is shutting down.
2023-03-30T18:07:37 Error bgpd [EC 100663299] buffer_flush_available: write error on fd 2: Bad file descriptor
2023-03-30T18:07:37 Error bgpd [EC 100663299] buffer_flush_available: write error on fd 2: Bad file descriptor
2023-03-30T18:07:33 Warning zebra [EC 4043309122] Client 'bfd' encountered an error and is shutting down.


#14
Hi.

I just noticed after upgrading to 23.1.3 that firewall rules that have a destination invert are now displayed in the GUI missing the leading "!"
#15
I've been tearing my hair out with wireguard, CARP, FRR and wireguard-kmod stability issues.

I know, not properly in the kernel, not yet supported, use at your own risk... etc.
It's just that wireguard is so good compared to IPSEC, it sets up so fast, it makes failover amazing.

The issue that keeps happening is wireguard is listed as started but no handshakes occur until you start wireguard again.


What I suspect
I believe the issue is the configd_run 'wireguard start' doesn't work until:

  • you reboot the firewall once with wireguard running, even if handshakes were empty
  • likely because the first time wireguard is started, if the interfaces are not present then the first start creates the wireguard interfaces but then fails to actually start wireguard

configd_run for wireguard needs to:
To check for wireguard interfaces and if missing, wait and start wireguard again properly.


Is anyone able to help look at the configd_run 'wireguard start" script?
#16
I'm really liking the crowdsec system. I just had a few questions that I thought of.

ONE
How often do the crowdsec_blacklists get updated? I'm seeing these updates in my logs:
I take it that it updates when triggered from the cloud end.

160 crowdsecurity/community-blocklist update : +8881/-0 IPs ban:8881
11 hours ago
159 crowdsecurity/community-blocklist update : +8897/-0 IPs ban:143
13 hours ago
158 crowdsecurity/community-blocklist update : +8924/-0 IPs
15 hours ago
157 crowdsecurity/community-blocklist update : +8917/-0 IPs ban:261
18 hours ago
156 crowdsecurity/community-blocklist update : +8777/-0 IPs ban:1232
2 days ago
155 crowdsecurity/community-blocklist update : +8791/-0 IPs ban:67
2 days ago


TWO
Now the big question. When crowdsec does update the blocklist, does this trigger a firewall filter reload? If it doesn't then obviously you don't get any updated benefit from your floating firewall block rule.
#17

This all used to work flawlessly. Super fast failover for WAN to WAN2 and super fast transition from fw1 to fw2 - like losing 2 pings only. Amazing!

However, since upgrading to 22.7.6

  • If you sync the primary firewall to the backup, wireguard starts on the backup firewall causing all sorts of issues. The nighly sync CRON job causes chaos
  • I can't pin it down, but now, FRR sometimes fails to startup too, yet FRR is set to follow CARP. e.g. restart the primary fw, it restarts, takes over as the CARP master and FRR fails to start sometimes!

I have resorted to "unticking" wireguard sync in the HA settings to prevent wireguard form starting on the backup firewall and adding another CRON job to run every minute  to enable or disable wireguard based on the CARP status https://gist.github.com/taxilian/eecdc1fb17cf70e8080118cf6d8af412

Any ideas what changed with 22.7.6?


#18
I have 3 sites, each site has multi WAN.

Site A: WAN1 & WAN2 plus HA firewall pair
Site B: WAN1 & WAN2 plus HA firewall pair
Site C: WAN1 & WAN2 plus HA firewall pair

I'm using FRR with BGP for dynamic routing and got it working great with 2 sites and excellent WAN failover, only losing 2 pings during WAN failover. As soon as I added a third site, I get a strange "allowed ips: (none)" and routing problems.

The allowed ips set for the peer is 0.0.0.0/0, but, it seems wireguard doesn't tolerate more peers with allowed ips of 0.0.0.0/0 set against the same local listener.

The reason I want to set 0.0.0.0/0 is I want to do all my routing using FRR, so, I don't want to have to set the peer allowed IP addresses in wireguard plus then control the IP addresses in FRR BGP.

See the attachment, it shows that the running config for the peer smPI... has allowed ips: (none) (but I can assure you, it has allowed ips of 0.0.0.0/0 set) and routing doesn't work. As soon as I put a list of allowed ips as expected from that peer, volia, it works.

The local endpoint (listener) of course has disable routes set.

FRR and BGP and BFD all working great.

Environment

  • OPNsense 22.7.2
  • Wireguard
  • Wireguard-kmod
  • 10-wireguard CARP hook script

#19
OPNsense 22.7.1 (and earlier versions I have discovered)

IP aliases set on the primary firewall do not sync across to the backup firewall.
#20
2022-07-23T18:51:24 Error opnsense /usr/local/etc/rc.filter_configure: There were error(s) loading the rules: /tmp/rules.debug:806: sticky-address cannot be redefined - The line in question reads [806]: pass in quick on vlan01 route-to {( vlan02 202.202.202.202 )} sticky-address inet proto {tcp udp} from $groveseg to $Marshal_updates port $http_https keep state label "9e64a311a494a21cfdbefcba91dad3a5" # : Allow ServerSEG license check


As soon as a add WAN fail-over capability to rules, this break badly.
I can't seem to pin down exactly what is going on, my best guess is the WAN fail-over "WAN1_failover_WAN2" gateway group is just not working.
Often, I can get the issue to go away by moving the rule to the top of the interface rules, or to the end. But that doesn't always work either.

Trouble shooting steps I have tried

  • Deleted the offending rule and made it again (doesn't always fix it)
  • Moving the rule around for rule order (also doesn't always fix it)
  • Rebooting OPNsense (definitely doesn't fix it)
  • Exported the config, looked the config by hand (seems fine) - re-import the config and reboot (doesn't fix it)
  • Also occurs when I take an existing rule and change the default gateway to the new WAN fail-over gateway

The WAN fail-over group looks perfect.

Any ideas anyone?