Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - nzkiwi68

#31
Figured it out.

The issue is if you set a "Synchronize Peer IP" address in:
    System: High Availability: Settings

It appears that its more work somehow for underlying FreeBSD and I guess state sync is not as easy and clean using unicast vs multicast.

Switching back to the standard multicast "224.0.0.240" address has solved the losing transactions issue.

We went from approx. 10 broken EFTPOS transactions per day to ~1 a week.

The fix
The takeaway here is don't use "Synchronize Peer IP" unless you really, really need to.


Recommendation for help text change
Change the "i" help text under "Synchronize Peer IP" to:

Setting this option will force pfsync to synchronize its state table to this IP address. The default is directed multicast. State sync via IP can be less reliable than standard multicast and is generally not recommended.
#32
Check DNS.

Slow and poorly responding DNS fits your symptoms
#33
I read about stability issues with HAPROXY and RSS but does anyone have any comments if this now works?
#34
I'm going to backup, flatten the existing appliance FW, build fresh with latest build and restore.

It's just not behaving properly and I can't see why.
#35
We need to allow direct access bypassing our proxy, so, I created an Alias:

Alias name: exch_online_hosts
Type: Host(s)
Content: autodiscover.companyXYZ.co.nz outlook.office365.com outlook.office.com

Across a number of OPNsense firewalls

  • some made the alias with 0 loaded IP addresses
  • some made the alias with 8 loaded IP addresses
  • most made the alias with 16 loaded IP addresses
  • others made the alias with 28 loaded IP addresses

On those installations that made the alias with 0 or 8 entries, I manually ran the CLI command:

/usr/local/opnsense/scripts/filter/update_tables.py

It returned Status "ok"

Alias now has 45 loaded entries!

Alias Host(s) type appears to have trouble with a Host alias that resolves to multiple additional names and then walking down through these and resolving those too, but, manually updating the tables from the CLI seems to work.


#36
Something has changed in 23.1.9 with pfsync Synchronize States and systems that were moderately stable now have significant errors.

------------

Retail customer, multi WAN fail-over, multi site, all with HA firewalls, running WireGuard VPN's running FRR and BGP, hub and spoke, all going back to central head office.

Approx. 40,000 transactions weekly

  • Before 23.1.9, about 4-5 POS transactions a week would error
  • Post 23.1.9 upgrade from 23.1.8 - 10+ POS transactions per day were getting broken
  • 23.1.9 disabling System: High Availability: Settings: Synchronize States - now 0 transactions per week being lost

Client - running "POS software", telnet client " POS bank" software
Server - running backend software, client connects to this server via telnet

Check out operator on client:

  • Client start checkout sale via telnet on server
  • Server writes a file into a directory on the server
  • Client POS software scans remote server directory over NetBIOS, sees file, reads file, starts POS bank
  • Client POS bank completes bank transaction with customer credit card etc, writes POS bank answer file in same directory on remote server over NetBIOS
  • Client POS software scans remote server directory over NetBIOS for POS bank answer file, reads file, tell server over telnet sale payment success or failure, sale completed

Error condition happens when sales fails to complete in 45 seconds.
But, what is actually happening, is sale is completed, checkout operator sees successful POS payment and client see POS terminal says payment success but somehow I believe state is lost and client POS software never reads the POS bank answer fille or the POS bank answer file never gets written and so sale hangs with error condition.

What is super interesting is by turning off pfsync Synchronize States, stability is restored.

Obviously this is less desirable in the long term as a firewall HA failover will disconnect all tills and any transactions in progress will be badly affected.
#37
That behavior is normal.

In a more complex setup like you are running, you would be expected to run NAT hybrid or NAT manual and write your own NAT rules.

If you have routes pointing back to internal subnet via a LAN or other internal interface connection to a layer 3 switch or another router and you want these to access the internet, then these all need a NAT rule too.

I never just have a blanket NAT rule, I always write a specific subnet NAT rule out.

That's normal for all sorts of firewall products I have worked with.
#38
This is not really the answer you are looking for, I know....

I have over the years had many issues with OSPF, running it on switches, pfSense and OPNsense and made the decision a few years ago to move to BGP.

BGP runs over TCP using port 179 unlike OSPF which is protocol 89 and I think that causes some issues on some networks.

I am recommending you do just that, move to BGP. Yes, it seems a lot more complex that OSPF, but, for just a few sites you really can get it working quite easily and the BGP tools are great, the options better, the filtering of routes that BGP can do more easily is better and things like BFD for fast failover, graceful restart and more.

BFD enables really fast convergence and the advantage of OSPF fast convergence is gone if you run BGP with BFD and then add in graceful restart whereby BGP keeps sending packets during the reload and the key reasons to run OSPF are no longer so compelling.

I need only to setup under BGP:
"neighbors", "prefix lists" and "route-maps"

Then under BFD I setup BFD neighbors.

The BGP diagnostics page is excellent and I can easily see what is really happening.

Spend the time and effort to move to BGP and once you get it, you won't go back.
#39
I had the same issue, also running CrowdSec.

Stopping CrowdSec from the services page, or inside CrowdSec, or rebooting the firewall didn't work. The firewall not respond to even the reboot command.

Used putty to run SSH session and ran these two commands:

pgrep crowdsec
pkill -9 crowdsec


Firewall pending reboot then occurred. On restart, re-ran firewall upgrade whoich completed successfully.

Ran firewall health audit to check the upgrade and firewall health - passed.
System > Firmware > Status > Run an Audit - Health
#40
May we have a URL link or notes as to the HAPROXY v2.6.11 to v2.6.12 changes please?

Probably very minor, but, I always try and read the release notes.


Thanks.
#41
23.1 Legacy Series / Re: Network alias not working
April 11, 2023, 11:35:19 PM
There are issues with Aliases that have been fixed whereby the Aliases are empty exactly as you describe.

Please ensure you are running 23.1.5_4
#42
I expect so.

But, we are not running IPsec as the site to site VPN, but, WireGuard, and, the problem is FRR errors that were not there before, not a WireGuard fault.

BTW - why WireGuard?
Because WireGuard is so fast for setup. If I use IPsec for site to site VPN with multi WAN and clustered firewalls, and you power off site A fw1, then site A fw2 takes over, but, IPsec takes ages, as long as 2 mins before IPsec will actually setup on fw2 and start passing traffic. It's also really bad if site A fw2 is the master and you power on site A fw1. Once fw1 comes up and becomes the CARP master and takes over, the VPN is down for far too long if using IPsec.

That's my experience across pfSense, OPNsense and multi customers. Fail-over for IPsec is too slow.

WireGuard the other hand is so fast, like 3-4 pings and the tunnel is up and running on fw2 and routing is working.

Telnet sessions are not broken during a clustered firewall fail-over with WireGuard and 100% of Telnet and RDS sessions break during an IPsec clustered firewall fail-over. Hence I am a big WireGuard fan.

WireGuard now needs a decent CARP fail-over script, and, ideally the ability to follow a single interface for CARP fail-over. in 99% of deployments I would have WireGuard following CARP watching the LAN interface only, because VPN's are linking site A LAN to to site B LAN (normally) I'm only interested in starting WireGuard on the firewall that has the LAN interface as CARP master.


#43
Yes.
BGP only.

Building configuration...

Current configuration:
!
frr version 7.5.1
frr defaults traditionnl
hostname byyfw1.localdomain
log syslog informationnl
!
router bgp 65525
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
bgp graceful-restart
neighbor 172.27.3.4 remote-as 65524
neighbor 172.27.3.4 bfd
neighbor 172.27.3.4 update-source wg4
neighbor 172.27.3.104 remote-as 65524
neighbor 172.27.3.104 bfd
neighbor 172.27.3.104 update-source wg5
neighbor 172.27.5.1 remote-as 65521
neighbor 172.27.5.1 bfd
neighbor 172.27.5.1 update-source wg1
neighbor 172.27.5.101 remote-as 65521
neighbor 172.27.5.101 bfd
neighbor 172.27.5.101 update-source wg2
!
address-family ipv4 unicast
  redistribute kernel
  redistribute connected
  redistribute static
  neighbor 172.27.3.4 activate
  neighbor 172.27.3.4 next-hop-self
  neighbor 172.27.3.4 prefix-list byy-xxy-prefix-out out
  neighbor 172.27.3.4 route-map prefer-wan1 in
  neighbor 172.27.3.104 activate
  neighbor 172.27.3.104 next-hop-self
  neighbor 172.27.3.104 prefix-list byy-xxy-prefix-out out
  neighbor 172.27.5.1 activate
  neighbor 172.27.5.1 next-hop-self
  neighbor 172.27.5.1 prefix-list byy-onn-prefix-out out
  neighbor 172.27.5.1 route-map prefer-wan1 in
  neighbor 172.27.5.101 activate
  neighbor 172.27.5.101 next-hop-self
  neighbor 172.27.5.101 prefix-list byy-onn-prefix-out out
exit-address-family
!
address-family ipv6 unicast
  redistribute kernel
  redistribute connected
  redistribute static
exit-address-family
!
ip prefix-list byy-onn-prefix-out seq 12 permit 10.5.55.0/24
ip prefix-list byy-onn-prefix-out seq 13 permit 10.5.80.0/24
ip prefix-list byy-onn-prefix-out seq 14 permit 10.5.50.0/24
ip prefix-list byy-onn-prefix-out seq 11 permit 10.5.45.0/24
ip prefix-list byy-onn-prefix-out seq 10 permit 192.168.5.0/24
ip prefix-list byy-xxy-prefix-out seq 20 permit 192.168.5.0/24
ip prefix-list byy-xxy-prefix-out seq 21 permit 10.5.80.0/24
!
route-map prefer-wan1 permit 10
set local-preference 300
!
line vty
!
bfd
peer 172.27.5.1
!
peer 172.27.3.4
!
peer 172.27.5.101
!
peer 172.27.3.104
!
!
end

#44
That's an unusual setup.
I wouldn't recommend this setup.

1.
If the VM has the ISP default gateway, then, the traffic won't be touching the firewall at all.
Consider this example:
FW WAN ip is 202.202.202.1
VM IP: 202.202.202.100
ISP gateway: 202.202.202.254

Then, the client traffic will go directly to the ISP gateway and the firewall will never see the traffic, hence rules not working.

2.
If the VM does have the FW's IP address as the gateway, then:
Make sure under:
Firewall > Settings > Advanced
"Static route filtering" is not ticked to make the firewall check WAN traffic that arrives (in) and then leaves again.

If it is ticked, then since the traffic enters the WAN and then leaves the WAN, then "Bypass firewall rules for traffic on the same interface" operates.





#45
When writing firewall rules, you need to think of packets from the firewalls point of view.

That's very important.

Firewall rules, in general, are set again an interface, and the direction is normally IN. That's because, from the firewall's point of view, a packet going out to internet from a LAN connected device, is received (hence IN) on the firewall's LAN interface. From there the packet does through the firewall rules, NAT and then leaves (out) on the WAN interface. With stateful firewalls (and OPNsense is most certainly a stateful firewall) you only write a firewall rule for the FIRST packet.

See the OPNsense documentation:
https://docs.opnsense.org/manual/firewall.html