This is a combination of Cluster Nodes and Quorum:

Cluster Nodes

Cluster nodes are the individual servers (physical or virtual) that are connected together to form a failover cluster. Their primary purpose is to work as a team to provide high availability for services and applications (like virtual machines, databases, or file shares).

graph TD
    A[Client Workstations] --> B[Virtual Cluster IP]
    B --> C[Node 1: Active]
    B --> D[Node 2: Active]
    B --> E[Node 3: Active]
    C --> F[Shared Storage<br/>SAN/iSCSI]
    D --> F
    E --> F
    C --> G[Heartbeat Network]
    D --> G
    E --> G
  • Collaboration: They are connected via a network and use clustering software to constantly communicate with each other.
  • Interchangeability: If one node fails, another node in the cluster can automatically take over its workload. This process is called failover.
  • Shared Resources: They typically have access to a common shared storage (like an SAN) where the critical data (e.g., virtual machine files) is kept. This allows any node to access the data needed to run a service.
  • Scalability: A Windows Server failover cluster can have up to 64 physical nodes.

Quorum

The quorum is a safety mechanism that prevents a disastrous situation called a “split-brain” scenario. It is a dynamic voting algorithm that ensures only one set of active, communicating nodes is allowed to control the cluster and access the shared storage at any given time.

Why?

Imagine a two-node cluster, and the network connection between them fails. Both nodes are still running, but they cannot see each other.

  • Without a quorum, both nodes might think the other has failed and try to take over the services independently.
  • This would mean two separate parts of the cluster are both trying to write to write to the same shared storage, leading to severe data corruption.

The quorum prevents this by requiring that a majority (more than half) of the voting members agree that the cluster is functional.

How?

  • Votes: Each node in the cluster typically has one vote. Cluster nodes constantly exchange heartbeat messages (usually every 1-2 seconds) to confirm they’re alive and can communicate.
  • Majority: The cluster needs more than 50% of the total available votes to start or continue running.
  • Witness: To avoid ties and add resilience in clusters with an even number of nodes, an external resource called a witness is added. The witness acts as an additional, tie-breaking vote. The document mentions several types:
    • Disk Witness: A disk in the shared storage.
    • File Share Witness: A file share on a highly available server outside the cluster.
    • Cloud Witness: A blob storage account in Microsoft Azure (modern and highly recommended).
graph LR
    N1[Node 1] -->|Heartbeat| N2[Node 2]
    N1 -->|Heartbeat| N3[Node 3]
    N2 -->|Heartbeat| N3

Example from the document:

  • A 5-node cluster has 5 votes. The cluster needs at least 3 votes to run.
    • If 2 nodes fail, 3 nodes remain → they have 3 votes (a majority of 5) → the cluster stays online.
    • If 3 nodes fail, 2 nodes remain → they have only 2 votes (not a majority of 5) → the cluster shuts down to prevent data corruption.