OPNsense Forum

Archive => 20.1 Legacy Series => Topic started by: Serius on May 28, 2020, 01:46:02 pm

Title: Help with HA setup
Post by: Serius on May 28, 2020, 01:46:02 pm
Hope someone can help me with this.
I had OPNSense running in a VM under esxi for some time. I didn't like loosing the network on server maintenance so I bought  an i3 NUC. While the NUC was coming I modified the existing installation to adapt to the new one.
I had three vlans on a virtual adapter each and the wan in a dedicated passthrough one. I changed it to a single trunk adapter with a router-on-stick configuration, with four vlans.

When I received the NUC I installed OPNS and restored a backup from the VM. Then I thought I could leave the VM and configure a CARP HA.
I followed this: https://www.thomas-krenn.com/en/wiki/OPNsense_HA_Cluster_configuration
and this: https://docs.opnsense.org/manual/how-tos/carp.html

I basically followed those instructions, but created a new vlan interface (also configured on switch) for the PFSync interface.
[As the NUC only has one net adapter (for now) I could not make the "mysterious and undocumented" LAGG overcome to allow syncs, so I left states synchronization deactivated.
I configured XMLRPC Sync.]

I fully configured the LAGG interface and HA settings.

Also, as I have several vlans, when documentation says to create a firewall rule to allow CARP, I created a vlan group with all the intranet+wan vlans and made the rule here*

As I have more than one interface, I created subsequent virtual IPs increasing the VHID group (WAN 1 / LAN 3 / TLN 4 / IOT 5)

The network was operative but I have the following problems:

Code: [Select]
2020-05-28T12:58:16 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:59 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:51 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:46 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:41 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:36 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:19 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:11 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:07 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:57:02 dhcpd: DHCPDISCOVER from 04:b1:67:1b:d1:62 via em0_vlan10: not responding (recovering)
2020-05-28T12:56:57 dhcpd: failover peer dhcp_lan: I move from startup to communications-interrupted
2020-05-28T12:56:57 dhcpd: failover peer dhcp_opt1: I move from startup to communications-interrupted
2020-05-28T12:56:57 dhcpd: failover peer dhcp_opt2: I move from startup to recover
2020-05-28T12:56:42 dhcpd: Server starting service.
2020-05-28T12:56:42 dhcpd: failover peer dhcp_lan: I move from communications-interrupted to startup
2020-05-28T12:56:42 dhcpd: failover peer dhcp_opt1: I move from communications-interrupted to startup
2020-05-28T12:56:42 dhcpd: failover peer dhcp_opt2: I move from recover to startup
2020-05-28T12:56:42 dhcpd: Sending on   Socket/fallback/fallback-net
2020-05-28T12:56:42 dhcpd: Sending on   BPF/em0_vlan1/f4:4d:30:6a:fb:9c/192.168.0.0/24
2020-05-28T12:56:42 dhcpd: Listening on BPF/em0_vlan1/f4:4d:30:6a:fb:9c/192.168.0.0/24
2020-05-28T12:56:42 dhcpd: Sending on   BPF/em0_vlan50/f4:4d:30:6a:fb:9c/192.168.50.0/24
2020-05-28T12:56:42 dhcpd: Listening on BPF/em0_vlan50/f4:4d:30:6a:fb:9c/192.168.50.0/24
2020-05-28T12:56:42 dhcpd: Sending on   BPF/em0_vlan10/f4:4d:30:6a:fb:9c/192.168.10.0/24
2020-05-28T12:56:42 dhcpd: Listening on BPF/em0_vlan10/f4:4d:30:6a:fb:9c/192.168.10.0/24
2020-05-28T12:56:42 dhcpd: Wrote 150 leases to leases file.
2020-05-28T12:56:42 dhcpd: Wrote 0 new dynamic host decls to leases file.
2020-05-28T12:56:42 dhcpd: Wrote 0 deleted host decls to leases file.
2020-05-28T12:56:42 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
2020-05-28T12:56:42 dhcpd: All rights reserved.
2020-05-28T12:56:42 dhcpd: Copyright 2004-2020 Internet Systems Consortium.
2020-05-28T12:56:42 dhcpd: Internet Systems Consortium DHCP Server 4.4.2
2020-05-28T12:56:42 dhcpd: PID file: /var/run/dhcpd.pid
2020-05-28T12:56:42 dhcpd: Database file: /var/db/dhcpd.leases
2020-05-28T12:56:42 dhcpd: Config file: /etc/dhcpd.conf
2020-05-28T12:56:42 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
2020-05-28T12:56:42 dhcpd: All rights reserved.
2020-05-28T12:56:42 dhcpd: Copyright 2004-2020 Internet Systems Consortium.
2020-05-28T12:56:42 dhcpd: Internet Systems Consortium DHCP Server 4.4.2
Note: Output from before LAGG implementation. Now lagg0_vlan**.

The DHCP in the other two interfaces work as expected. I can see successful DHCP negotiation for other interfaces in the log.

Thank you very much.

EDIT: I can see this block hit on the FW2 when I restart pfsync in the master:
BLOCK   wan      May 29 00:43:12   192.168.8.11   224.0.0.240   pfsync   Block private networks from WAN

This is an automatic rule. I removed it from the interface and added a PFSYNC rule with no effect to HA.
Title: Re: Help with HA setup
Post by: cyruspy on June 06, 2020, 08:05:19 am
I see the same issue. The second node down and the node that is alive doesn't take over

Quote
2020-06-06T01:55:59   dhcpd: DHCPDISCOVER from ac:63:be:61:31:4a via vtnet0: not responding (recovering)
2020-06-06T01:55:50   dhcpd: DHCPDISCOVER from ac:63:be:61:31:4a via vtnet0: not responding (recovering)
2020-06-06T01:55:47   dhcpd: DHCPDISCOVER from ac:63:be:61:31:4a via vtnet0: not responding (recovering)
2020-06-06T01:55:28   dhcpd: failover peer dhcp_lan: host down
2020-06-06T01:54:26   dhcpd: DHCPACK on 10.2.0.203 to 3c:a9:f4:85:4e:44 via vtnet0
2020-06-06T01:54:26   dhcpd: DHCPREQUEST for 10.2.0.203 from 3c:a9:f4:85:4e:44 via vtnet0
2020-06-06T01:54:02   dhcpd: DHCPDISCOVER from aa:c8:aa:14:5f:0e via vtnet0: not responding (recovering)
2020-06-06T01:53:58   dhcpd: failover peer dhcp_lan: host down
2020-06-06T01:53:45   dhcpd: DHCPDISCOVER from aa:c8:aa:14:5f:0e via vtnet0: not responding (recovering)
2020-06-06T01:53:37   dhcpd: DHCPDISCOVER from aa:c8:aa:14:5f:0e via vtnet0: not responding (recovering)

In the leases view I see:

Quote
My State = recover
Peer State = unknown-state

CARP status is Master, VIP's mask matches base subnet.
VIP setup: (vhid 1 , freq. 1 / 0)

OPNsense 20.1.7-amd64

Any hints?
Title: Re: Help with HA setup
Post by: Serius on June 07, 2020, 06:23:48 pm
The past 10 days I've been trying to make CARP work without success.

I've got PFSyncs, xml syncs and dhcp somewhat working, but still plenty of issues:

The CARP status shows normal.
I've tried to build a configuration from the config examples but it doesn't make a difference.
Is High availability broken?

If someone is willing I could post my configurations, stripping passwords and such.
Title: Re: Help with HA setup
Post by: iMx on June 10, 2020, 04:00:29 pm
I do not know the current status, but perhaps it provides further research for you, I seem to recall sometime ago reading about problems with CARP on a LAGG.  But perhaps that was just on older versions and was fixed, I'm afraid I cannot recall.

I can say, however, that I run a CARP cluster in the DC, without LAGG, and do not see any problems - so I don't believe it is broken.
Title: Re: Help with HA setup
Post by: erje on June 12, 2020, 10:14:05 pm
I have been trying to get HA to work for a while but got stuck. See my post  https://forum.opnsense.org/index.php?topic=16782.0

I have used the same reference documentation as you so possibly there is a fault there.

Since you are using ESXi on (at least) one node, the follwoing link could be interesting if you haven't made the special configuration in ESXi yet: https://medium.com/@glmdev/how-to-set-up-virtualized-pfsense-on-vmware-esxi-6-x-2c2861b25931

It got me from completely nothing to something that kind of works -if you forget about the lack of DNS :-)