1
High availability / HA Failover Error - Ratelimiting?
« on: September 13, 2024, 09:48:14 am »
Hello,
we've an problem with our HA Setup and stuck there.
Its a 2 node Setup with 24.10.
We enabled state sync.
If we enable maintenance mode on the master the 2nd node takes over the work.
If we try to ping an host while this happens, it works realy well. We got no or just one ping packet that does not reach its destination. So basicly it should be fine.
Problem
But our problem is that after an master switch pretty much all other connections expect icmp arent working well.
We cant see a pattern there independent of port, protocol, ip, vlan or interface connections cant be successfull established.
The strange thing about it is after a while (~30mins) the whole thing settles down and the connections can be successfully established again. We can see in our monitoring system how more and more connections are made the longer we wait.
We can reproduce this every time we switch the master.
It seems to us as if there is some kind of rate limit or something similar at play here and blocks.
The server doesnt got a high load or anything. On the switch side we see nothing special happen. The interfaces arent much utilized too. The states table is also not full.
We've found options in the advanced settings for every firewall rule:
But these are all untouched / empty for all our rules.
Any Ideas what happens with our setup while Failover and how we can fix it? How can we further analyse it?
In its current state it is unfortunately completely unusable.
we've an problem with our HA Setup and stuck there.
Its a 2 node Setup with 24.10.
We enabled state sync.
If we enable maintenance mode on the master the 2nd node takes over the work.
If we try to ping an host while this happens, it works realy well. We got no or just one ping packet that does not reach its destination. So basicly it should be fine.
Problem
But our problem is that after an master switch pretty much all other connections expect icmp arent working well.
We cant see a pattern there independent of port, protocol, ip, vlan or interface connections cant be successfull established.
The strange thing about it is after a while (~30mins) the whole thing settles down and the connections can be successfully established again. We can see in our monitoring system how more and more connections are made the longer we wait.
We can reproduce this every time we switch the master.
It seems to us as if there is some kind of rate limit or something similar at play here and blocks.
The server doesnt got a high load or anything. On the switch side we see nothing special happen. The interfaces arent much utilized too. The states table is also not full.
We've found options in the advanced settings for every firewall rule:
- Max new connections
- Max source states
- Max established
- Max states
- Max source nodes
But these are all untouched / empty for all our rules.
Any Ideas what happens with our setup while Failover and how we can fix it? How can we further analyse it?
In its current state it is unfortunately completely unusable.