Active-Active HA tunning

Started by pmladenov, March 30, 2021, 02:11:51 PM

Previous topic - Next topic
March 30, 2021, 02:11:51 PM Last Edit: March 30, 2021, 09:04:41 PM by pmladenov
Hello,

Currently I have a HA setup acting as active/active with 2 nodes and pfsync between them (unicast) and pure routing (BGP), without relying on CARP at the moment.

Although I have tested all possible ways of session asymmetry (for instance TCP SYN via FW1, tcp SYC+ACK via FW2 and all other variations) and all looks to work well in the LAB that's not the case outside of the testing environment.
With real traffic (with low number of session < 1000) I'm getting complaints for TCP re-transmits which seems to happen when there's is asymmetrical flows.
I suspect it is related to some kind of pfsync timers (preventing timely synchronization between both firewall nodes)

I've read pfsync(4) and ifconfig(8 ) for both FreeBSD and OpenBSD several times, however I can't fully understand the concept for:

1) pfsync defer option - from the OpenBSD pfsync man page, but nothing in the FreeBSD pfsync man page:

QuoteWhere more than one firewall might actively handle packets, e.g. with certain ospfd, bgpd or carp(4) configurations, it is beneficial to defer transmission of the initial packet of a connection. The pfsync state insert message is sent immediately; the packet is queued until either this message is acknowledged by another system, or a timeout has expired. This behaviour is enabled with the defer parameter to ifconfig.

So in simple words - what's happening after FW1 receives TCP SYN segment and that traffic is allowed by PF rulebase (and we expect that the SYN+ACT segment will be returned back via FW2) with defer and without defer option enabled?

2) pfsync maxupd option, by default set to 128. 

QuoteThe pfsync interface will attempt to collapse multiple state updates into a single packet where possible.The maximum number of times a single state can be updated before a pfsync packet will be sent out is controlled by the maxupd parameter to ifconfig (see ifconfig and the example below for more details). The sending out of a pfsync packet will be delayed by a maximum of one second.

Is it make sense to decrease that parameter to avoid waiting for up to one second before sending pfsync packets to the peer?

3) net.pfsync.pfsync_buckets

QuoteThe number of pfsync buckets.This affects the performance and memory tradeoff.Defaults to twice the number of CPUs.Change only if benchmarks show this helps on your workload.

Any idea here what and how should I monitor to set this properly?


P.S.1 Just went back to the pcap files - almost for all TCP sessions (with few exception) - the segment with SYN flag was re-transmitted in 1 second after the first SYN was sent.
So we have:

(1) Client ----------SYN ---------> FW1 ------------------> Server
                                                    |
                                                 pfsync
                                                    |
(2) Client <------------------------- FW2 <-- SYN+ACK----- Server

Seems that FW2 in (2) is denying SYN+ACK sent from Server in response to the Client, probably because it hasn't seen SYN-SENT session yet from FW1.

P.S.2 - Confirmed - the returned SYC+ACK segment (from Server to Client) is dropped by FW2. It comes just before the state is replicated from FW1 to FW2. I tried with and without defer option on pfsync0 interface on both FWs and don't see any changes in the behavior. Probably the queuing of the initial packet is not working?

Hi,
did you ever get this solved?
Would be interesting for us also, as we use 2 sense in BGP and have more or less Active/Passive by prepending one AS on the "passive" Sense.... But that a massive waste of performance as we have a fill Internet-Live passive the whole time....

I do not advise to do this, you are creating a race condition between your internet flow and pfsync replication and whenever the internet flow is faster, your stuff doesn't work.

Routing on routers,  firewalling on Firewalls. Dont do such hacky things.

So if we have another firewall behind might "Disable all packet filtering" in Firewall>Settings>Advanced do that trick?


Quote from: mimugmail on December 22, 2022, 12:10:25 PM
Yes:)

Short answer.... beside having no more packet-filtering at all and no NAT, which is in our case not needed as everything is public IPs, we keep the GUI and status-pages but no more states? Will the Firewall-Rules get "flushed" or just stay as they are but have no effect at all?

Maybe I should create a test-VM for this....  ;D