Before we start discussing why do we need 3 voting disks or odd number of voting disk, why should know the Role of Voting Disk
The Role of Voting Disks in Oracle RAC
Voting Disk Purpose:
- Voting disks are files that help Oracle RAC determine which nodes in the cluster are healthy and should remain operational.
- They act like a "tie-breaker" during communication failures (e.g., network issues or node failure) to prevent split-brain scenarios.
Split-Brain Problem:
- Split-brain happens when nodes in the cluster lose communication with each other and both believe they are healthy.
- If both nodes simultaneously access shared storage, it can cause data corruption.
- Voting disks resolve this by deciding which node (or nodes) should stay in the cluster.
Why 3 Voting Disks?
Case with 1 Voting Disk:
- If you use only 1 voting disk, what happens if the disk becomes inaccessible (e.g., disk failure)?
- The cluster cannot determine which nodes are healthy.
- The entire cluster may shut down to avoid corruption.
Case with 2 Voting Disks:
- With 2 voting disks, if there is a communication failure:
- Each node might claim 1 voting disk and assume it is the only active node.
- This causes a split-brain scenario because both nodes think they own the cluster.
Case with 3 Voting Disks:
- When there are 3 voting disks, the cluster can tolerate the failure of 1 disk or the failure of communication between nodes:
- A node needs to access more than half the votes (majority) to stay in the cluster.
- If a node cannot access the majority, it is evicted.
Examples
Scenario 1: Normal Operation
- Node A, Node B, and 3 voting disks (VD1, VD2, VD3) are operational.
- Each node can access all 3 voting disks, and everything works fine.
Scenario 2: Network Failure Between Nodes
Network heartbeat fails between Node A and Node B.
Both nodes try to access the voting disks to decide which one stays.
- Node A accesses VD1 and VD2 (2 votes).
- Node B accesses only VD3 (1 vote).
- Outcome: Node A wins (majority), Node B is evicted.
Scenario 3: Voting Disk Failure
- Suppose VD3 fails, leaving VD1 and VD2.
- Nodes can still function because:
- Node A and Node B can both access VD1 and VD2 (majority).
- The cluster continues running without interruption.
Scenario 4: Node Failure
- If Node B fails completely:
- Node A can still access VD1, VD2, and VD3 (majority).
- The cluster continues with Node A.
Why a Majority Is Critical
Oracle RAC requires more than half of the votes (majority) to prevent split-brain:
- In a 3-vote setup, 2 votes are needed for a majority.
- This ensures:
- If 1 node or disk fails, the other node/disk can still form a majority.
- If more than half the votes are unavailable, the cluster shuts down to avoid corruption.
Key Points About Voting Disk Design
- Odd Number of Voting Disks:
- Always configure an odd number of voting disks (3, 5, 7, etc.) to prevent tie scenarios.
- Redundancy:
- Voting disks are typically stored on shared storage (e.g., ASM) with redundancy to handle hardware failures.
No comments:
Post a Comment