OPNsense Forum

Archive => 17.1 Legacy Series => Topic started by: Andreas_ on June 22, 2017, 06:41:04 pm

Title: carp/pfsync but connections interrupted
Post by: Andreas_ on June 22, 2017, 06:41:04 pm
As far as I understand carp/pfsync, the backup firewall should take over the running sessions from the master more or less interruption free. On an heavy traffic tcp session, maybe some packets might need retransmission since there's a time gap between pfsyncs, but not more.

Apparently, this assumption is wrong. Even on a virtually idle system, connections are broken. I have a OS 17.1.8 pair (virtualized on Xen 4.4.1), machines are connected via 10GBit. The web server in the DMZ accesses the file storage via NFS4 tcp port 2049. When rebooting the master (for whatever reason, e.g. fw upgrade), the nfs mount might report a stale handle with no chance to recover but reboot the web server VM. I tried to demote the master to slave upfront, with persistence, and promote later back again, but this doesn't seem to help too much.

All other connections might break as well, but the nfs connection is really fatal.

Is there any means to make the carp/pfsync stuff work without breakage? Even a fancy solution improving the nfs breakage only would help. I've been observing this problem since pfSense 1.x/2.x/OS16.7 and it's still annoying.

Regards
Andreas
Title: Re: carp/pfsync but connections interrupted
Post by: Wayne Train on June 23, 2017, 09:11:38 am
Hi Andreas,

i'm also experiencing various CARP problerms from splitbrains to interface flapping and so on.
I quite new to OPNsense (since january) and I'm running a CARP cluster on 17.1.1 which seems to wok like expected.
A few days ago some new firewalls for another subsidiary came and I upgraded them to 17.1.8 and now I'm experiencing total confusing behaviour, although the config of the newer cluster is almost the same like on the prod one.

I'm bonding two interfaces to LAGG0 and configured VLANs on LAGG0. The VIPs are VLAN-IPs too.

LAGG0
---------------------------------------
 - VLAN 10 --> V-IP ...10.1
 - VLAN 20 --> V-IP ...20.1
 - VLAN 30 --> V-IP ...30.1

The newer Cluster works as long as I'm only provoking failovers on WAN side. As soon as I shut down the LAGG that holds the VLANs my setup split brains. When I set BACKUP to maintenance it releases its V-IPs and MASTER becomes the master over all again, but its behaviour is quite laggy.

I haven't testet provoking a failover on the LAGG on our production system, since this won't be a good idea within the business hours, but I will do so on satyrday. At the moment i expect, no, I hope, that this behaviour is only related to 17.1.8 and not to 17.1.1 and hopefully a bug in the newer release, since the release notes mention that some work on CARP has be done.

Do you have similar experience ?
Do you also use a LAGG and VLANs ?

Best regards,
Wayne