pfsync in HA setup leads to regular tcp stalls

Started by fhloston, November 11, 2024, 04:38:46 PM

Previous topic - Next topic
November 11, 2024, 04:38:46 PM Last Edit: November 12, 2024, 01:26:35 PM by fhloston
Hi,

I am seeing the following issue:

"longer" tcp connections stall every one in n-th try.

I can reproduce this by running a while loop on the firewall itself that uses curl to get a 500mb file.
When the current download rate slowly drops to 0 and never recovers i have reproduced the issue.

All devices "behind" this setup are affected, larger downloads sometimes fail, docker image pulls have high chance of failure.

When I switch off pfsync the issue is resolved.

The firewall rule on the sync interface allows all traffic.

Pfsync is configured according to https://docs.opnsense.org/manual/how-tos/carp.html

a) can anybody reproduce?
b) is this a bug?

Martin

Update: I can reproduce this on two freshly installed 24.7.8 firewalls. Running the curl loop on both at the same time leads to stalls rather quickly.

Update2: I setup the same on two pfsense 2.7.2 firewalls. This does not reproduce the issue.

November 12, 2024, 02:11:26 PM #1 Last Edit: November 12, 2024, 02:15:32 PM by iMx
Are you using unicast sync on both opnsense and pfsense?

The opnsense documentation seems to suggest specifying a unicast address, but the pfsense documentation seems to lean more towards 'not' and using multicast.

EDIT: Going back a bit, looks like someone else had an issue with Unicast:

https://forum.opnsense.org/index.php?topic=34522.0

Unicast vs. multicast seems to make no difference.

What makes a difference however ist disabling multiqueue in proxmox. Removing the queues=X parameter completely mitigates the issue.

However, I know of two other OPNsense on Proxmox installations that do not have this issue and run fine with queues=8.

Mystery.