DHCP Failover in CARP Cluster unknown-state

Started by NilsS, September 30, 2020, 06:56:58 PM

Previous topic - Next topic
September 30, 2020, 06:56:58 PM Last Edit: September 30, 2020, 07:12:09 PM by NilsS
Ich verzweifele gerade am DHCP failover. Wo kann ich den Fehler eingrenzen?

opnSense1

Interface
    <opt7>
      <if>vmx0_vlan22</if>
      <descr>GAST_LAN</descr>
      <enable>1</enable>
      <lock>1</lock>
      <spoofmac/>
      <ipaddr>192.168.32.4</ipaddr>
      <subnet>23</subnet>
    </opt7>
VIP
    <vip>
      <type>single</type>
      <subnet_bits>32</subnet_bits>
      <mode>carp</mode>
      <interface>opt7</interface>
      <descr>GAST LAN VID</descr>
      <subnet>192.168.32.1</subnet>
      <vhid>22</vhid>
      <advskew>0</advskew>
      <advbase>1</advbase>
      <password>password</password>
    </vip>
DHCP
    <opt7>
      <enable>1</enable>
      <failover_peerip>192.168.32.6</failover_peerip>
      <failover_split>128</failover_split>
      <gateway>192.168.32.1</gateway>
      <domain>gast.local</domain>
      <ddnsdomainalgorithm>hmac-md5</ddnsdomainalgorithm>
      <numberoptions>
        <item/>
      </numberoptions>
      <range>
        <from>192.168.32.50</from>
        <to>192.168.32.100</to>
      </range>
      <winsserver/>
      <dnsserver>192.168.32.1</dnsserver>
      <ntpserver/>
    </opt7>




2020-09-30T18:02:32 dhcpd[79397] failover peer dhcp_opt7: I move from startup to recover
2020-09-30T18:02:17 dhcpd[79397] Server starting service.
2020-09-30T18:02:17 dhcpd[79397] failover peer dhcp_opt7: I move from recover to startup
2020-09-30T18:02:17 dhcpd[79397] Sending on Socket/fallback/fallback-net


opnSense2

Interface
    <opt7>
      <if>vmx0_vlan22</if>
      <descr>GAST_LAN</descr>
      <enable>1</enable>
      <lock>1</lock>
      <spoofmac/>
      <ipaddr>192.168.32.6</ipaddr>
      <subnet>23</subnet>
    </opt7>
   
VIP
    <vip>
      <type>single</type>
      <subnet_bits>32</subnet_bits>
      <mode>carp</mode>
      <interface>opt7</interface>
      <descr>GAST LAN VID</descr>
      <subnet>192.168.32.1</subnet>
      <vhid>22</vhid>
      <advskew>100</advskew>
      <advbase>1</advbase>
      <password>password</password>
    </vip>

DHCP
    <opt7>
      <enable>1</enable>
      <failover_peerip>192.168.32.4</failover_peerip>
      <gateway>192.168.32.1</gateway>
      <domain>gast.local</domain>
      <ddnsdomainalgorithm>hmac-md5</ddnsdomainalgorithm>
      <numberoptions>
        <item/>
      </numberoptions>
      <range>
        <from>192.168.32.50</from>
        <to>192.168.32.100</to>
      </range>
      <winsserver/>
      <dnsserver>192.168.32.1</dnsserver>
      <ntpserver/>
    </opt7>



2020-09-30T18:02:41 dhcpd[84634] failover peer dhcp_opt7: I move from startup to recover
2020-09-30T18:02:26 dhcpd[84634] Server starting service.
2020-09-30T18:02:26 dhcpd[84634] failover peer dhcp_opt7: I move from recover to startup


jemand ne Idee ?

Ich hab in der dhcpd.conf was von Ports 519 und 520 gefunden und hab dann auf dem Interface? (oder muss es das Carp Interface sein?) mal tcp/udp erlaubt (da gehen laut inspect auch Pakete rüber)

Aber es ändert sich nix.

opnsense1 ist Master bei allen VIPs
GAST_LAN@22   192.168.32.1    MASTER
opnsense2 ist Backup
GAST_LAN@22   192.168.32.1    BACKUP

Wo kann ich noch suchen


EDIT:

2020-09-30T18:47:28 dhcpd[79397] DHCPREQUEST for 192.168.32.59 from b4:0b:e4:86:11:32 via vmx0_vlan22: not responding (recovering)
2020-09-30T18:47:23 dhcpd[79397] DHCPREQUEST for 192.168.32.58 from 00:08:29:e0:11:2a via vmx0_vlan22: not responding (recovering)
2020-09-30T18:47:22 dhcpd[79397] DHCPREQUEST for 192.168.32.56 from 00:08:29:66:11:a9 via vmx0_vlan22: not responding (recovering)
2020-09-30T18:47:22 dhcpd[79397] DHCPREQUEST for 192.168.32.57 from b4:0b:e4:40:11:06 via vmx0_vlan22: not responding (recovering)
2020-09-30T18:47:18 dhcpd[79397] DHCPREQUEST for 192.168.32.54 from b4:0b:e4:40:11:e0 via vmx0_vlan22: not responding (recovering)

Nils,

Were you able to resolve the issue? I am having exactly the same problem at this moment.
Disabling the peer resolves the issue, but that is not really a solution for the long term.

Niels

Nope, i didn't even know where to look for an error.

On what kind of Hardware runs your cluster? Mine is on vSphere 7 with a Distributed Switch.

Are you on real Hardware?

Nils,

Sorry for the late response. Yes, we run on Deciso hardware. It has been quite problematic to get a stable cluster inside VMWARE ESX. Sometimes it works, sometimes it fails, depending on which physical host the machines are running.

Niels

Alles Troubleshooting für virtuelle Umgebungen schon gemacht?
Ist das VLAN sauber auf beiden Seiten aufgelegt?
Läuft die CARP VIP auf dem Interface sauber?
Gibt es Verbindungsprobleme in dem VLAN?
Ist auf dem dSwitch entsprechend für das VLAN und an den Interfaces der Sense alles freigeschaltet? Broadcasting, Multicasting, Promiscous etc. alles erlaubt?

Stimmt die Uhrzeit auf beiden Clusterknoten sauber überein? Läuft ein NTP auf beiden und ist aktuell?
"It doesn't work!" is no valid error description! - Don't forget to [applaud] those offering time & brainpower to help you!
Better have some *sense as no(n)sense! ;)

If you're interested in german-speaking business support, feel free to reach out via PM.