Home
Help
Search
Login
Register
OPNsense Forum
»
Archive
»
23.1 Legacy Series
»
[SOLVED] HA Synchronize States via Peer IP leads to intermittent state loss
« previous
next »
Print
Pages: [
1
]
Author
Topic: [SOLVED] HA Synchronize States via Peer IP leads to intermittent state loss (Read 1081 times)
nzkiwi68
Full Member
Posts: 182
Karma: 20
[SOLVED] HA Synchronize States via Peer IP leads to intermittent state loss
«
on:
June 21, 2023, 01:25:34 am »
Something has changed in 23.1.9 with
pfsync Synchronize States
and systems that were moderately stable now have significant errors.
------------
Retail customer, multi WAN fail-over, multi site, all with HA firewalls, running WireGuard VPN's running FRR and BGP, hub and spoke, all going back to central head office.
Approx. 40,000 transactions weekly
Before 23.1.9, about 4-5 POS transactions a week would error
Post 23.1.9 upgrade from 23.1.8 - 10+ POS transactions per day were getting broken
23.1.9 disabling System: High Availability: Settings: Synchronize States - now 0 transactions per week being lost
Client - running "POS software", telnet client " POS bank" software
Server - running backend software, client connects to this server via telnet
Check out operator on client:
Client start checkout sale via telnet on server
Server writes a file into a directory on the server
Client POS software scans remote server directory over NetBIOS, sees file, reads file, starts POS bank
Client POS bank completes bank transaction with customer credit card etc, writes POS bank answer file in same directory on remote server over NetBIOS
Client POS software scans remote server directory over NetBIOS for POS bank answer file, reads file, tell server over telnet sale payment success or failure, sale completed
Error condition happens when sales fails to complete in 45 seconds.
But, what is actually happening, is sale is completed, checkout operator sees successful POS payment and client see POS terminal says payment success but somehow I believe state is lost and client POS software never reads the POS bank answer fille or the POS bank answer file never gets written and so sale hangs with error condition.
What is super interesting is by turning off pfsync Synchronize States, stability is restored.
Obviously this is less desirable in the long term as a firewall HA failover will disconnect all tills and any transactions in progress will be badly affected.
«
Last Edit: July 12, 2023, 04:47:06 am by nzkiwi68
»
Logged
nzkiwi68
Full Member
Posts: 182
Karma: 20
Re: [SOLVED] HA Synchronize States via Peer IP leads to intermittent state loss
«
Reply #1 on:
July 12, 2023, 04:44:24 am »
Figured it out.
The issue is if you set a "
Synchronize Peer IP
" address in:
System: High Availability: Settings
It appears that its more work somehow for underlying FreeBSD and I guess state sync is not as easy and clean using unicast vs multicast.
Switching back to the standard multicast "224.0.0.240" address has solved the losing transactions issue.
We went from approx. 10 broken EFTPOS transactions per day to ~1 a week.
The fix
The takeaway here is don't use "Synchronize Peer IP" unless you really, really need to.
Recommendation for help text change
Change the "i" help text under "Synchronize Peer IP" to:
Setting this option will force pfsync to synchronize its state table to this IP address. The default is directed multicast. State sync via IP can be less reliable than standard multicast and is generally not recommended.
Logged
Print
Pages: [
1
]
« previous
next »
OPNsense Forum
»
Archive
»
23.1 Legacy Series
»
[SOLVED] HA Synchronize States via Peer IP leads to intermittent state loss