Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - nzkiwi68

#31
23.7 Legacy Series / Re: Alias cannot contain comments?
September 06, 2023, 03:05:55 AM
I hear you... but in OPNsense you can can a master Alias which contains the individual aliases.


Alias: Allowed_oVPN_out

That master Alias contains a lot of single Aliases:

Alias: John
Alias: Mary_iPad
Alias: Mary_laptop

Etc... so you can do this today.
#32
With Wireguard now baked into the core with 23.7.3, my Wireguard custom CARP script broke. I enlisted the help of a friend and together we built a new Wireguard CARP fail-over script.

It works absolutely brilliantly and it's ready for production.
I note that now as the CARP script gets called multiple times as multiple VHID's transition, but it no longer affects Wireguard forcing it to start and stop and often break as was the case. As this new script is called multiple times during a CARP transition from backup master, it starts up Wireguard once and remains stable, each time it is called again it doesn't restart Wireguard which is very good.

Add Wireguard CARP awareness to the GUI and follow a single interface
It would be very nice though, if the GUI had an option "Enable CARP Fail-over" (like FRR)
but!
Also have a drop down where you select a single interface to follow

Why?

  • Because despite it being "impossible" over the many years of pfSense and now OPNsense experience I have seen many instances where CARP is misaligned between the backup and the primary firewall.
  • Also, I would almost always follow the LAN interface, where the Wireguard VPN tunnels exit to and begin from and that's really the only CARP interface that we would want to start or stop Wireguard to follow.
  • The new follow a single interface CARP script will exit quickly if the CARP event is not for the required interface, less work for the firewall to process.

I know I have seen many posts here and on GitHub requesting CARP for Wireguard (including me) and questions raised as to why. I thought I would summarise why Wireguard needs to follow CARP.

Why Wireguard with HA needs to stop and start and follow CARP

  • Wireguard doesn't bind to VHID's, it binds to all interfaces (like the WAN firewall interface) and therefore on a CARP fail-over, the backup firewall Wireguard keeps running and interfering with the primary firewall.
  • The above statement is especially true with Wireguard keepalives
  • Because of the above two points, you need to have under HA sync Wireguard "unselected" to keep Wireguard "off" on the backup firewall but then changes to the Wireguard config are not sync'd anymore.
  • This enables FRR to be left on, even on the backup firewall, no need for dynamic routing to be off on the backup firewall (because with Wireguard off, no traffic can pass) and thus on fail-over everything comes up super fast, it's very good.

I think Wireguard is now "prime time ready".


Here is my script.
I fully acknowledge that this is not my exclusive work, but follows the built-in CARP scripts.

Place this script here:
/usr/local/etc/rc.syshook.d/carp
Ensure the Rights are execute
Call the script "10-wireguard"

----- Script Start -----
#!/usr/local/bin/php
<?php

require_once("config.inc");
require_once("util.inc");

$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';

if (!in_array($type, ['MASTER', 'BACKUP'])) {
      log_msg("Carp '$type' event unknown from source '{$subsystem}'");
      exit;
   }

if (!strstr($subsystem, '@')) {
        log_msg("Carp '$type' event triggered from wrong source '{$subsystem}'");
        exit;
    }

    switch ($type) {
        case 'MASTER':
         $config['OPNsense']['wireguard']['general']['enabled'] = '1';
         write_config("Enable WireGuard on this peer due to CARP event", false);
         log_msg("Starting WireGuard due to CARP event '$type'");
            break;
        case 'BACKUP':
         $config['OPNsense']['wireguard']['general']['enabled'] = '0';
         write_config("Disable WireGuard on this peer due to CARP event", false);
         log_msg("Stopping WireGuard due to CARP event '$type'");
            break;
    }

use OPNsense\Core\Backend;
$backend = new Backend();
$backend->configdRun('template reload OPNsense/Wireguard');
$backend->configdpRun('wireguard configure');
----- Script Stop -----

(I couldn't get the script with "code" selected to look nicely, it keep adding lots of junk to the script...)

Please note, my script above doesn't follow a single interface like LAN, but, it does work!

I can create this on Github if appropriate. Comments please.
#33
Force of habit re HA.

With HA and a CARP VIP, I find it best to bind HAPROXY to localhost and have a NAT for the WAN CARP VIP fwd to localhost.
#34
I've been running WG client on 443 UDP for quite some time. I have the OPNsense GUI moved to another port and HAPROXY running, but listening on 127.0.0.1:44443 with a NAT port forward to TCP 443 to the localhost 127.0.0.1:44443 so 443 UDP is definitely free.

This has been running fine, for quite some time. I use 443 UDP because some places lock down outbound traffic and since the introduction of HTTPS over UDP, I find WG often works and a traditional WG port of say 51820 does not.

Anyway, all working great on an iPad, iPhone and Win 11 laptop. I don't use the client WG VPN much, but I just noticed if I connect, I get exactly 92 B received and nothing works.

Rebooted OPNsense, start and stop WG, try different client (iPad, iPhone and PC) but it just doesn't work. Move WG on OPNsense to UDP 1194 (I know, that's really oVPN, but I'm not running oVPN), move the client port to 1194 and voilia, it all works again.

Something changed upgrading from 23.7 to 23.7.1_3 that broke WG listening on 443 UDP.

I'm waiting until 23.7.3 before I upgrade further, unless 23.7.2 is known to fix this.

*** Update! ****
I changed the GUI to only listen on the LAN interface

System > Settings > Administration > Listen Interfaces > LAN

And now I can have WG working on 443 UDP again.

It looks like 23.7.1_3 binds the GUI somehow to 443 UDP... not to sure how that works....
#35
You could continue to those 8.8.8.8 & 8.8.4.4 IP addresses for your gateway monitor and for your DNS use something different such as:


  • Your ISPsrecommended / provided DNS servers
or
  • CloudFlare's excellent public DNS servers 1.1.1.1 & 1.0.0.1




#36
QuoteEdit: But now having another issue. When clicking on the + (add a static mapping) next to line with the lease, I get send back to the "Lobby"

Try:
Clearing your browser cache
Or
Using another browser
Or
Using private browser mode

Just to check that your issue is not browser caching related.
#37
Figured it out.

The issue is if you set a "Synchronize Peer IP" address in:
    System: High Availability: Settings

It appears that its more work somehow for underlying FreeBSD and I guess state sync is not as easy and clean using unicast vs multicast.

Switching back to the standard multicast "224.0.0.240" address has solved the losing transactions issue.

We went from approx. 10 broken EFTPOS transactions per day to ~1 a week.

The fix
The takeaway here is don't use "Synchronize Peer IP" unless you really, really need to.


Recommendation for help text change
Change the "i" help text under "Synchronize Peer IP" to:

Setting this option will force pfsync to synchronize its state table to this IP address. The default is directed multicast. State sync via IP can be less reliable than standard multicast and is generally not recommended.
#38
Check DNS.

Slow and poorly responding DNS fits your symptoms
#39
I read about stability issues with HAPROXY and RSS but does anyone have any comments if this now works?
#40
I'm going to backup, flatten the existing appliance FW, build fresh with latest build and restore.

It's just not behaving properly and I can't see why.
#41
We need to allow direct access bypassing our proxy, so, I created an Alias:

Alias name: exch_online_hosts
Type: Host(s)
Content: autodiscover.companyXYZ.co.nz outlook.office365.com outlook.office.com

Across a number of OPNsense firewalls

  • some made the alias with 0 loaded IP addresses
  • some made the alias with 8 loaded IP addresses
  • most made the alias with 16 loaded IP addresses
  • others made the alias with 28 loaded IP addresses

On those installations that made the alias with 0 or 8 entries, I manually ran the CLI command:

/usr/local/opnsense/scripts/filter/update_tables.py

It returned Status "ok"

Alias now has 45 loaded entries!

Alias Host(s) type appears to have trouble with a Host alias that resolves to multiple additional names and then walking down through these and resolving those too, but, manually updating the tables from the CLI seems to work.


#42
Something has changed in 23.1.9 with pfsync Synchronize States and systems that were moderately stable now have significant errors.

------------

Retail customer, multi WAN fail-over, multi site, all with HA firewalls, running WireGuard VPN's running FRR and BGP, hub and spoke, all going back to central head office.

Approx. 40,000 transactions weekly

  • Before 23.1.9, about 4-5 POS transactions a week would error
  • Post 23.1.9 upgrade from 23.1.8 - 10+ POS transactions per day were getting broken
  • 23.1.9 disabling System: High Availability: Settings: Synchronize States - now 0 transactions per week being lost

Client - running "POS software", telnet client " POS bank" software
Server - running backend software, client connects to this server via telnet

Check out operator on client:

  • Client start checkout sale via telnet on server
  • Server writes a file into a directory on the server
  • Client POS software scans remote server directory over NetBIOS, sees file, reads file, starts POS bank
  • Client POS bank completes bank transaction with customer credit card etc, writes POS bank answer file in same directory on remote server over NetBIOS
  • Client POS software scans remote server directory over NetBIOS for POS bank answer file, reads file, tell server over telnet sale payment success or failure, sale completed

Error condition happens when sales fails to complete in 45 seconds.
But, what is actually happening, is sale is completed, checkout operator sees successful POS payment and client see POS terminal says payment success but somehow I believe state is lost and client POS software never reads the POS bank answer fille or the POS bank answer file never gets written and so sale hangs with error condition.

What is super interesting is by turning off pfsync Synchronize States, stability is restored.

Obviously this is less desirable in the long term as a firewall HA failover will disconnect all tills and any transactions in progress will be badly affected.
#43
That behavior is normal.

In a more complex setup like you are running, you would be expected to run NAT hybrid or NAT manual and write your own NAT rules.

If you have routes pointing back to internal subnet via a LAN or other internal interface connection to a layer 3 switch or another router and you want these to access the internet, then these all need a NAT rule too.

I never just have a blanket NAT rule, I always write a specific subnet NAT rule out.

That's normal for all sorts of firewall products I have worked with.
#44
This is not really the answer you are looking for, I know....

I have over the years had many issues with OSPF, running it on switches, pfSense and OPNsense and made the decision a few years ago to move to BGP.

BGP runs over TCP using port 179 unlike OSPF which is protocol 89 and I think that causes some issues on some networks.

I am recommending you do just that, move to BGP. Yes, it seems a lot more complex that OSPF, but, for just a few sites you really can get it working quite easily and the BGP tools are great, the options better, the filtering of routes that BGP can do more easily is better and things like BFD for fast failover, graceful restart and more.

BFD enables really fast convergence and the advantage of OSPF fast convergence is gone if you run BGP with BFD and then add in graceful restart whereby BGP keeps sending packets during the reload and the key reasons to run OSPF are no longer so compelling.

I need only to setup under BGP:
"neighbors", "prefix lists" and "route-maps"

Then under BFD I setup BFD neighbors.

The BGP diagnostics page is excellent and I can easily see what is really happening.

Spend the time and effort to move to BGP and once you get it, you won't go back.
#45
I had the same issue, also running CrowdSec.

Stopping CrowdSec from the services page, or inside CrowdSec, or rebooting the firewall didn't work. The firewall not respond to even the reboot command.

Used putty to run SSH session and ran these two commands:

pgrep crowdsec
pkill -9 crowdsec


Firewall pending reboot then occurred. On restart, re-ran firewall upgrade whoich completed successfully.

Ran firewall health audit to check the upgrade and firewall health - passed.
System > Firmware > Status > Run an Audit - Health