Disclaimer

Monday, 31 March 2025

Understanding CSSD Heartbeat Mechanisms in Oracle RAC

 


Understanding CSSD Heartbeat Mechanisms in Oracle RAC

The Cluster Services Synchronization Daemon (CSSD) is a critical process in Oracle RAC that continuously monitors the health of cluster nodes using two independent heartbeat mechanisms:

  1. Network Heartbeat

  2. Disk Heartbeat




🔹 Network Heartbeat

  • Sent every 1 second over the interconnect (private network) using TCP.

  • A sending thread of CSSD sends the heartbeat to all other nodes and itself.

  • A receiving thread on each node listens for heartbeats from others.

✅ TCP handles error correction, but Oracle does not rely on TCP retransmissions for heartbeat monitoring. Heartbeat loss is interpreted at the Oracle level.

Heartbeat Loss Monitoring (Misscount Logic):

  • If a node does not receive a heartbeat from another node:

    • At 15 seconds (50% of misscount)WARNING logged.

    • At 22 seconds (75%) → Another WARNING logged.

    • At 27 seconds (90%) → Additional warning.

    • At 30 seconds (100%) [default misscount]Node is evicted from the cluster.




🔹 Disk Heartbeat

  • Occurs between each node and the voting disk.

  • CSSD maintains a 1 OS block-sized heartbeat in a specific offset on the voting disk using pread / pwrite syscalls.

  • CSSD:

    • Writes its own heartbeat (with a counter and node name in the block header).

    • Reads/Monitors the heartbeat blocks of all other nodes.

⚠️ If a node fails to write its heartbeat within the disk I/O timeout period, it is considered dead.
If its status is unknown and it's not part of the "survivor" node group, the node is evicted (via a "kill block" update in the voting disk).

 


🔸 Summary of Heartbeat Requirements

Heartbeat TypeFrequencyTimeout ConditionConsequence
Network1 secondcss_misscount (default: 30s)Node eviction
Disk1 seconddisktimeoutNode eviction




🔸 Failure Matrix for Heartbeat Scenarios

Network PingDisk PingReboot?
Completes within misscount secondsCompletes within disktimeoutNo
Completes within misscount secondsTakes more than misscount but < disktimeoutNo
Completes within misscount secondsTakes more than disktimeoutYes
Takes more than misscount secondsCompletes within disktimeoutYes




🔧 Understanding Voting Disk and Its Role in Oracle RAC Clusterware

The Voting Disk is a vital component in Oracle RAC that helps determine node membership, resolve split-brain conditions, and enforce I/O fencing. It plays a key role alongside the CSSD process, which uses both network and disk heartbeats for node health monitoring.




🧠 What is Stored in the Voting Disk?

  • Information about cluster node membership.

  • Disk-based heartbeat blocks for each node.

  • Kill blocks to mark evicted nodes.

Voting Disks are written using pwrite() and read using pread() system calls by the CSSD process.

Each node writes to a specific offset (its own heartbeat block) and reads others’ blocks to check their liveness.

Although the OCR and OLR also store node information, Voting Disk heartbeat plays a runtime role in eviction decisions. There’s no persistent user or application data, so if a Voting Disk is lost, it can be re-added without data loss—but only after stopping CRS.




🔁 Why Voting Disks Are Crucial

While it’s technically true that data in voting disks can be recreated, they’re instrumental in avoiding split-brain and enforcing evictions when:

  • Heartbeat failures occur.

  • Nodes lose contact with others.

  • Shared storage needs to be protected (I/O fencing).




💥 Split Brain Syndrome

A split-brain situation arises when cluster nodes lose communication via the private interconnect but continue running independently. Each node assumes others are down and may attempt to access and modify shared data blocks.

❌ Risk:

This leads to data integrity violations, such as concurrent conflicting updates to the same data block.




🧱 I/O Fencing

After a node failure or eviction, it’s possible that leftover I/O from that node reaches storage out of order, corrupting data. To prevent this:

  • Oracle performs I/O fencing by removing failed nodes' access to shared storage.

  • This ensures only surviving nodes can read/write to the disk.




⚖️ Simple Majority Rule

Oracle Clusterware requires a simple majority of voting disks to be accessible at all times:

"More than half" of the voting disks must be online for the cluster to operate.

📌 Formula:

To tolerate loss of N disks → Configure at least 2N+1 voting disks.




🔍 Examples – Voting Disk in Action

✅ Example 1: Network Heartbeat Failure in 3-node Cluster

  • Setup: 3 nodes (Node 1, Node 2, Node 3) and 3 Voting Disks.

  • Issue: Node 3 loses network heartbeat with Node 1 and Node 2, but disk heartbeat still working.

  • Action: Node 1 and 2 can still see each other and determine via the Voting Disk that Node 3 is isolated.

  • They mark Node 3’s kill block in Voting Disk.

  • During next pread(), Node 3 sees the self-kill flag and evicts itself.

  • I/O fencing ensures safe disk access. OHASD then gracefully shuts down and restarts the stack on Node 3.



✅ Example 2: Disk Heartbeat Split in 2-node Cluster

  • Setup: 2 nodes and 3 Voting Disks.

  • Issue: Node 1 sees 2 voting disks; Node 2 sees only 1.

  • Based on Simple Majority Rule:

    • Node 1 (majority access) is the survivor.

    • CSSD of Node 1 marks Node 2’s kill block.

  • Node 2 reads the kill flag and evicts itself.

  • I/O fencing is applied, and OHASD restarts the stack on Node 2.

🧠 Without an odd number of disks, both nodes could think they're healthy, leading to potential split-brain.


 



📌 Summary

ComponentPurpose
Voting DiskMaintains disk heartbeats, kill blocks, and node membership info.
Network HeartbeatChecks interconnect communication every second via TCP.
Disk HeartbeatChecks I/O access health via shared storage every second.
Split-BrainScenario where isolated nodes continue operating independently.
I/O FencingPrevents failed nodes from sending stale writes to shared storage.
Simple MajorityEnsures more than half of voting disks are accessible to avoid eviction.


No comments:

Post a Comment

Understanding CSSD Heartbeat Mechanisms in Oracle RAC

  Understanding CSSD Heartbeat Mechanisms in Oracle RAC The Cluster Services Synchronization Daemon (CSSD) is a critical process in Oracle...