Palo Alto Firewall: Active-Active HA: Probleme mit dem Sync der Konfiguration

HA_Synchronization_RevB downloaden

Die PA soll im HA Modus mit 3 HA Links verbunden werden. Man hat mit der Konfiguration angefangen … und übersehen, dass der HA Link (ganz oder zeitweise) down war. Das führte vermutlich dazu, dass die Konfiguration in einem inkonsisteten Zustand ist. wir versuchen uns da heranzutasten …

Als erstes die Ausgabe von

show high-availibility state
admin@PA1(active-primary)>
Group 1:
  Mode: Active-Active
  Local Information:
    Version: 1
    Mode: Active-Active
    State: active-primary (last 25 days)
    Device Information:
      Management IPv4 Address: 172.25.3.99/24
      Management IPv6 Address:
      Jumbo-Frames disabled; MTU 1500
    HA1 Control Links Joint Configuration:
      Encryption Enabled: no
    Election Option Information:
      Priority: 100
      Preemptive: no
    Version Compatibility:
      Software Version: Match
      Application Content Compatibility: Match
      Threat Content Compatibility: Match
      Anti-Virus Compatibility: Match
      VPN Client Software Compatibility: Match
      Global Protect Client Software Compatibility: Match
    State Synchronization: synchronized; type: ethernet
  Peer Information:
    Connection status: up
    Version: 1
    Mode: Active-Active
    State: active-secondary (last 6 minutes)
    Last suspended state reason: User requested
    Device Information:
      Management IPv4 Address: 172.25.3.199/24
      Management IPv6 Address:
      Jumbo-Frames disabled; MTU 1500
      Connection down; Reason: Never able to connect to peer
      Connection up; Primary HA1 link
    Election Option Information:
      Priority: 200
      Preemptive: no
  Configuration Synchronization:
    Enabled: yes
    Running Configuration: not synchronized
      Out-of-sync Reason: Failure to complete config sync
admin@PA2(active-secondary)>
Group 1:
  Mode: Active-Active
  Local Information:
    Version: 1
    Mode: Active-Active
    State: active-secondary (last 5 minutes)
    Device Information:
      Management IPv4 Address: 172.25.3.199/24
      Management IPv6 Address:
      Jumbo-Frames disabled; MTU 1500
    HA1 Control Links Joint Configuration:
      Encryption Enabled: no
    Election Option Information:
      Priority: 200
      Preemptive: no
    Version Compatibility:
      Software Version: Match
      Application Content Compatibility: Match
      Threat Content Compatibility: Match
      Anti-Virus Compatibility: Match
      VPN Client Software Compatibility: Match
      Global Protect Client Software Compatibility: Match
    State Synchronization: synchronized; type: ethernet
  Peer Information:
    Connection status: up
    Version: 1
    Mode: Active-Active
    State: active-primary (last 6 minutes)
    Device Information:
      Management IPv4 Address: 172.25.3.99/24
      Management IPv6 Address:
      Jumbo-Frames disabled; MTU 1500
      Connection down; Reason: Never able to connect to peer
      Connection up; Primary HA1 link
    Election Option Information:
      Priority: 100
      Preemptive: no
  Configuration Synchronization:
    Enabled: yes
    Running Configuration: not synchronized
      Out-of-sync Reason: Started with config out-of-sync
admin@PA2(active-secondary)>

Mit dem Befehl

request high-availability sync-to-remote running-config

 

soll die Konfig zu dem anderen Peer übertragen werden. Wir probieren es …

admin@PA1(active-primary)> request high-availability sync-to-remote running-config
  <Enter>  Finish input

admin@PA1(active-primary)> request high-availability sync-to-remote running-config
Executing this command will overwrite the candidate configuration on 
the peer and trigger a commit on the peer. 
Do you want to continue(y/n)? (y or n)

Successfully synchronized running configuration with HA peer

Es hat funktioniert, nun checken wir, ob die Daten noch da sind …

Alles ist scheibar OK, trotzdem sind die Daten inkonsistent. Schuld daran war bestimmt der Ausfall des HA1 Links, der u.A auch für den Sync der Konfig verantworlich ist:

HA1 Link Failure
If the HA1 Link fails and there is no HA1 Backup configured, configuration synchronization will fail and a split brain condition will be created. Split brain conditions occur when HA members can no longer communicate with each other to exchange HA monitoring information. Each HA member will assume the other member is in a non-functional state and take over as the Active (A/P) or Active-Primary (A/A). Split brain conditions can be prevented by configuring an HA1 Backup link and/or enabling Heartbeat Backup.

The HA control link also known as the HA1 link is used by the HA agent for the devices in HA to communicate with one another. The HA1 link is a layer3 link requiring an IP address. The HA agent uses TCP port 28769 for clear text communication, or SSH over TCP port 49969 if using encryption. This connection is used to send and receive hellos and HA state information, and configuration sync and management plane sync, such as routing and user-id information. Configuration changes to either units are automatically synchronized to the other device over this link The PA-4000 Series and PA-5000 Series firewalls have dedicated HA1 links. All other platforms require a revenue port to be configured as a HA1 link.
Monitor Hold time: HA control link monitoring tracks the state of the HA1 link to see if the peer HA device is down. This will catch a power-cycle, a reboot, or a power down of the peer device. To ignore the flapping of a link that wouldn’t necessarily take the HA control connection down, a monitor hold down timer for the HA control link monitoring can be configured. The monitor hold down time is configured under the HA1 link. The default value is 3000ms.

Und hier noch eine Empfehlung:
Configuration changes, commits, and synchronization between HA members should be planned and overlapping changes and commits should be avoided whenever possible.

Da dies nicht bemerkt wurde, ist es dringend nötig, sich über solche Fehler per Mail benachrichtigen zu lassen. Diese Mail hätte schon früher geholfen. Dazu einen Mailepfänger samt SMTP Server konfigurieren und unter

54141

Committing a critical High Availability (HA) group configuration was resulting in an email alert following commit: “SYSTEM ALERT: critical: HA Group 1: Running configuration not synchronized after retries”. A timeout on the HA peer while committing the HA synchronization caused the email alert to be generated.

So geht es:

Screen_2014.03.05__00870__006

 

Screen_2014.03.05__00869__005

Weitere Erkentnisse: Es scheint, als ob das Problem mit dem der Active / Active HA zusammenhängt. Die Antworten stehen noch aus. Auf jeden Fall kann man sich mit der CLI mit commit force helfen.

configure
commit force

….

 

 

to be continued :=)