HA Pair - 2nd having connectivity problems

Started by tjk, June 02, 2024, 04:50:19 AM

Previous topic - Next topic
June 02, 2024, 04:50:19 AM Last Edit: June 02, 2024, 05:01:51 AM by tjk
I'm slowly moving our clusters from PFSense to OPNSense Business.  I have a HA setup on the latest business edition, but the passive firewall cannot ping the Internet, cannot check for upgrades, etc.  I can ping the active FW interfaces and can get into it via the web interface but nothing outbound from the cli works on it.

Edit - When I fail over, the 2nd unit works just fine and I have the same problem on the 1st unit, TLDR the passive unit can't ping the Internet, check for updates, etc.

What am I missing?

Please tell me how your WAN is setup.

1. One CARP VIP for IPv4 without explicit IP adresses set on the WAN interfaces.
2. One CARP VIP for IPv4 and two additional IPv4 adresses set on backup and master OPNsense WAN interface.

If its 1, that behavior is expected.
Hardware:
DEC740

Quote from: Monviech on June 02, 2024, 09:33:23 AM
Please tell me how your WAN is setup.

1. One CARP VIP for IPv4 without explicit IP adresses set on the WAN interfaces.
2. One CARP VIP for IPv4 and two additional IPv4 adresses set on backup and master OPNsense WAN interface.

If its 1, that behavior is expected.

Option #2.  WAN FW 1 has an IP, FW 2 has an IP, and I have a CARP VIP setup for outbound NAT, IE fw1 is .2, fw2 is .3, VIP is .4.

Tom

June 02, 2024, 04:06:34 PM #3 Last Edit: June 02, 2024, 04:08:50 PM by Monviech
It kinda sounds like there is some sort of routing issue.

Does the CARP VIP and the Interface IPs have the right subnet?

So for example, if all of them are in a /28 net, both WAN interfaces and the CARP VIP should have /28.

Also, have you made sure there is no IP collision (or if its a VM a MAC collision <- very hard to find) in that network with another device maybe?
Hardware:
DEC740

Quote from: Monviech on June 02, 2024, 04:06:34 PM
It kinda sounds like there is some sort of routing issue.

Does the CARP VIP and the Interface IPs have the right subnet?

So for example, if all of them are in a /28 net, both WAN interfaces and the CARP VIP should have /28.

Also, have you made sure there is no IP collision (or if its a VM a MAC collision <- very hard to find) in that network with another device maybe?

/25 on the entire setup, .1/25 is upstream Juniper, .2/25 on FW1 .3/25 on FW2, .4/25 is the VIP.  What is odd is that the passive fw has the issues and it follows whichever node is passive.

Just thinking out loud.  The passive has a NAT outbound rule to use the .4 VIP, which is on the active FW, so when the packet goes out, it says respond to .4 which is the VIP on the active unit and the passive unit doesn't get the response.

Is there a rule I need to have on the passive unit to not use the VIP for outbound?  That would seem odd but logically I'm trying to follow the packet flow.

June 02, 2024, 07:50:28 PM #6 Last Edit: June 02, 2024, 08:00:48 PM by Monviech
The firewall shouldnt NAT itself since it has the public IP directly.

Maybe its a gateway problem. Check if there is a Gateway for the passive Firewall to push the pakets to the Juniper. Maybe they turn off or dpinger is active or something like that.

The opnsense itself uses the gateway marked as default. (active) in the GUI. (Upstream Gateway)

Check if both WAN interfaces actually have a Gateway set or just the VIP does.
Hardware:
DEC740

Quote from: Monviech on June 02, 2024, 07:50:28 PM
The firewall shouldnt NAT itself since it has the public IP directly.

Maybe its a gateway problem. Check if there is a Gateway for the passive Firewall to push the pakets to the Juniper. Maybe they turn off or dpinger is active or something like that.

The opnsense itself uses the gateway marked as default. (active) in the GUI. (Upstream Gateway)

Check if both WAN interfaces actually have a Gateway set or just the VIP does.

They both show .1 as the default gateway in the cli.

I also find this interesting.

if I pfctl -d on the passive fw, everything works just fine, even through I have a default allow rule for testing on all interfaces.

root@core-fw-02:~ # ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
^C
--- 1.1.1.1 ping statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss
root@core-fw-02:~ # traceroute 1.1.1.1
traceroute to 1.1.1.1 (1.1.1.1), 64 hops max, 40 byte packets
1  *^C

root@core-fw-02:~ # pfctl -d
pf disabled
root@core-fw-02:~ # ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: icmp_seq=0 ttl=58 time=0.382 ms
64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=0.321 ms

Sorry my crystal ball has run dry right now.  :P

If deactivating pf solves the issue, I would check the NAT rules again. Maybe there is a NAT rule that matches on all traffic and NATs the firewall traffic itself.

But, I'm only guessing now.

I ran the exact same combination as you before (2 OPNsense in HA with juniper as gateway, /26 and additional /27 net) and both had internet, and the ruleset was really really complicated.

I'm sure its something annoyingly simple and easy.

Check the firewall live log, log all rules. Check tcpdump what happens to the pakets. Etc...
Hardware:
DEC740