Understanding CSSD Heartbeat Mechanisms in Oracle RAC
The Cluster Services Synchronization Daemon (CSSD) is a critical process in Oracle RAC that continuously monitors the health of cluster nodes using two independent heartbeat mechanisms:
-
Network Heartbeat
-
Disk Heartbeat
🔹 Network Heartbeat
-
Sent every 1 second over the interconnect (private network) using TCP.
-
A sending thread of CSSD sends the heartbeat to all other nodes and itself.
-
A receiving thread on each node listens for heartbeats from others.
✅ TCP handles error correction, but Oracle does not rely on TCP retransmissions for heartbeat monitoring. Heartbeat loss is interpreted at the Oracle level.
Heartbeat Loss Monitoring (Misscount Logic):
-
If a node does not receive a heartbeat from another node:
-
At 15 seconds (50% of misscount) → WARNING logged.
-
At 22 seconds (75%) → Another WARNING logged.
-
At 27 seconds (90%) → Additional warning.
-
At 30 seconds (100%) [default misscount] → Node is evicted from the cluster.
🔹 Disk Heartbeat
-
Occurs between each node and the voting disk.
-
CSSD maintains a 1 OS block-sized heartbeat in a specific offset on the voting disk using
pread
/pwrite
syscalls. -
CSSD:
-
Writes its own heartbeat (with a counter and node name in the block header).
-
Reads/Monitors the heartbeat blocks of all other nodes.
-
⚠️ If a node fails to write its heartbeat within the disk I/O timeout period, it is considered dead.
If its status is unknown and it's not part of the "survivor" node group, the node is evicted (via a "kill block" update in the voting disk).
🔸 Summary of Heartbeat Requirements
Heartbeat Type | Frequency | Timeout Condition | Consequence |
---|---|---|---|
Network | 1 second | css_misscount (default: 30s) | Node eviction |
Disk | 1 second | disktimeout | Node eviction |
🔸 Failure Matrix for Heartbeat Scenarios
Network Ping | Disk Ping | Reboot? |
---|---|---|
Completes within misscount seconds | Completes within disktimeout | No |
Completes within misscount seconds | Takes more than misscount but < disktimeout | No |
Completes within misscount seconds | Takes more than disktimeout | Yes |
Takes more than misscount seconds | Completes within disktimeout | Yes |
🔧 Understanding Voting Disk and Its Role in Oracle RAC Clusterware
The Voting Disk is a vital component in Oracle RAC that helps determine node membership, resolve split-brain conditions, and enforce I/O fencing. It plays a key role alongside the CSSD process, which uses both network and disk heartbeats for node health monitoring.
🧠 What is Stored in the Voting Disk?
-
Information about cluster node membership.
-
Disk-based heartbeat blocks for each node.
-
Kill blocks to mark evicted nodes.
Voting Disks are written using
pwrite()
and read usingpread()
system calls by the CSSD process.
Each node writes to a specific offset (its own heartbeat block) and reads others’ blocks to check their liveness.
Although the OCR and OLR also store node information, Voting Disk heartbeat plays a runtime role in eviction decisions. There’s no persistent user or application data, so if a Voting Disk is lost, it can be re-added without data loss—but only after stopping CRS.
🔁 Why Voting Disks Are Crucial
While it’s technically true that data in voting disks can be recreated, they’re instrumental in avoiding split-brain and enforcing evictions when:
-
Heartbeat failures occur.
-
Nodes lose contact with others.
-
Shared storage needs to be protected (I/O fencing).
💥 Split Brain Syndrome
A split-brain situation arises when cluster nodes lose communication via the private interconnect but continue running independently. Each node assumes others are down and may attempt to access and modify shared data blocks.
❌ Risk:
This leads to data integrity violations, such as concurrent conflicting updates to the same data block.
🧱 I/O Fencing
After a node failure or eviction, it’s possible that leftover I/O from that node reaches storage out of order, corrupting data. To prevent this:
-
Oracle performs I/O fencing by removing failed nodes' access to shared storage.
-
This ensures only surviving nodes can read/write to the disk.
⚖️ Simple Majority Rule
Oracle Clusterware requires a simple majority of voting disks to be accessible at all times:
"More than half" of the voting disks must be online for the cluster to operate.
📌 Formula:
To tolerate loss of N
disks → Configure at least 2N+1
voting disks.
🔍 Examples – Voting Disk in Action
✅ Example 1: Network Heartbeat Failure in 3-node Cluster
-
Setup: 3 nodes (Node 1, Node 2, Node 3) and 3 Voting Disks.
-
Issue: Node 3 loses network heartbeat with Node 1 and Node 2, but disk heartbeat still working.
-
Action: Node 1 and 2 can still see each other and determine via the Voting Disk that Node 3 is isolated.
-
They mark Node 3’s kill block in Voting Disk.
-
During next
pread()
, Node 3 sees the self-kill flag and evicts itself. -
I/O fencing ensures safe disk access. OHASD then gracefully shuts down and restarts the stack on Node 3.
✅ Example 2: Disk Heartbeat Split in 2-node Cluster
-
Setup: 2 nodes and 3 Voting Disks.
-
Issue: Node 1 sees 2 voting disks; Node 2 sees only 1.
-
Based on Simple Majority Rule:
-
Node 1 (majority access) is the survivor.
-
CSSD of Node 1 marks Node 2’s kill block.
-
-
Node 2 reads the kill flag and evicts itself.
-
I/O fencing is applied, and OHASD restarts the stack on Node 2.
🧠 Without an odd number of disks, both nodes could think they're healthy, leading to potential split-brain.
📌 Summary
Component | Purpose |
---|---|
Voting Disk | Maintains disk heartbeats, kill blocks, and node membership info. |
Network Heartbeat | Checks interconnect communication every second via TCP. |
Disk Heartbeat | Checks I/O access health via shared storage every second. |
Split-Brain | Scenario where isolated nodes continue operating independently. |
I/O Fencing | Prevents failed nodes from sending stale writes to shared storage. |
Simple Majority | Ensures more than half of voting disks are accessible to avoid eviction. |