Amit's Oracle DBA Blog: Understanding CSSD Heartbeat Mechanisms in Oracle RAC

Understanding CSSD Heartbeat Mechanisms in Oracle RAC

The Cluster Services Synchronization Daemon (CSSD) is a critical process in Oracle RAC that continuously monitors the health of cluster nodes using two independent heartbeat mechanisms:

Network Heartbeat
Disk Heartbeat

🔹 Network Heartbeat

Sent every 1 second over the interconnect (private network) using TCP.
A sending thread of CSSD sends the heartbeat to all other nodes and itself.
A receiving thread on each node listens for heartbeats from others.

✅ TCP handles error correction, but Oracle does not rely on TCP retransmissions for heartbeat monitoring. Heartbeat loss is interpreted at the Oracle level.

Heartbeat Loss Monitoring (Misscount Logic):

If a node does not receive a heartbeat from another node:

At 15 seconds (50% of misscount) → WARNING logged.
At 22 seconds (75%) → Another WARNING logged.
At 27 seconds (90%) → Additional warning.
At 30 seconds (100%) [default misscount] → Node is evicted from the cluster.

🔹 Disk Heartbeat

Occurs between each node and the voting disk.
CSSD maintains a 1 OS block-sized heartbeat in a specific offset on the voting disk using pread / pwrite syscalls.
CSSD:
- Writes its own heartbeat (with a counter and node name in the block header).
- Reads/Monitors the heartbeat blocks of all other nodes.

⚠️ If a node fails to write its heartbeat within the disk I/O timeout period, it is considered dead.
If its status is unknown and it's not part of the "survivor" node group, the node is evicted (via a "kill block" update in the voting disk).

🔸 Summary of Heartbeat Requirements

Heartbeat Type	Frequency	Timeout Condition	Consequence
Network	1 second	`css_misscount` (default: 30s)	Node eviction
Disk	1 second	`disktimeout`	Node eviction

🔸 Failure Matrix for Heartbeat Scenarios

Network Ping	Disk Ping	Reboot?
Completes within misscount seconds	Completes within disktimeout	No
Completes within misscount seconds	Takes more than misscount but < disktimeout	No
Completes within misscount seconds	Takes more than disktimeout	Yes
Takes more than misscount seconds	Completes within disktimeout	Yes

🔧 Understanding Voting Disk and Its Role in Oracle RAC Clusterware

The Voting Disk is a vital component in Oracle RAC that helps determine node membership, resolve split-brain conditions, and enforce I/O fencing. It plays a key role alongside the CSSD process, which uses both network and disk heartbeats for node health monitoring.

🧠 What is Stored in the Voting Disk?

Information about cluster node membership.
Disk-based heartbeat blocks for each node.
Kill blocks to mark evicted nodes.

Voting Disks are written using pwrite() and read using pread() system calls by the CSSD process.

Each node writes to a specific offset (its own heartbeat block) and reads others’ blocks to check their liveness.

Although the OCR and OLR also store node information, Voting Disk heartbeat plays a runtime role in eviction decisions. There’s no persistent user or application data, so if a Voting Disk is lost, it can be re-added without data loss—but only after stopping CRS.

🔁 Why Voting Disks Are Crucial

While it’s technically true that data in voting disks can be recreated, they’re instrumental in avoiding split-brain and enforcing evictions when:

Heartbeat failures occur.
Nodes lose contact with others.
Shared storage needs to be protected (I/O fencing).

💥 Split Brain Syndrome

A split-brain situation arises when cluster nodes lose communication via the private interconnect but continue running independently. Each node assumes others are down and may attempt to access and modify shared data blocks.

❌ Risk:

This leads to data integrity violations, such as concurrent conflicting updates to the same data block.

🧱 I/O Fencing

After a node failure or eviction, it’s possible that leftover I/O from that node reaches storage out of order, corrupting data. To prevent this:

Oracle performs I/O fencing by removing failed nodes' access to shared storage.
This ensures only surviving nodes can read/write to the disk.

⚖️ Simple Majority Rule

Oracle Clusterware requires a simple majority of voting disks to be accessible at all times:

"More than half" of the voting disks must be online for the cluster to operate.

📌 Formula:

To tolerate loss of N disks → Configure at least 2N+1 voting disks.

🔍 Examples – Voting Disk in Action

✅ Example 1: Network Heartbeat Failure in 3-node Cluster

Setup: 3 nodes (Node 1, Node 2, Node 3) and 3 Voting Disks.
Issue: Node 3 loses network heartbeat with Node 1 and Node 2, but disk heartbeat still working.
Action: Node 1 and 2 can still see each other and determine via the Voting Disk that Node 3 is isolated.
They mark Node 3’s kill block in Voting Disk.
During next pread(), Node 3 sees the self-kill flag and evicts itself.
I/O fencing ensures safe disk access. OHASD then gracefully shuts down and restarts the stack on Node 3.

✅ Example 2: Disk Heartbeat Split in 2-node Cluster

Setup: 2 nodes and 3 Voting Disks.
Issue: Node 1 sees 2 voting disks; Node 2 sees only 1.
Based on Simple Majority Rule:
- Node 1 (majority access) is the survivor.
- CSSD of Node 1 marks Node 2’s kill block.
Node 2 reads the kill flag and evicts itself.
I/O fencing is applied, and OHASD restarts the stack on Node 2.

🧠 Without an odd number of disks, both nodes could think they're healthy, leading to potential split-brain.

📌 Summary

Component	Purpose
Voting Disk	Maintains disk heartbeats, kill blocks, and node membership info.
Network Heartbeat	Checks interconnect communication every second via TCP.
Disk Heartbeat	Checks I/O access health via shared storage every second.
Split-Brain	Scenario where isolated nodes continue operating independently.
I/O Fencing	Prevents failed nodes from sending stale writes to shared storage.
Simple Majority	Ensures more than half of voting disks are accessible to avoid eviction.

Amit's Oracle DBA Blog

Disclaimer

Monday, 31 March 2025

Understanding CSSD Heartbeat Mechanisms in Oracle RAC

Understanding CSSD Heartbeat Mechanisms in Oracle RAC

🔹 Network Heartbeat

Heartbeat Loss Monitoring (Misscount Logic):

🔹 Disk Heartbeat

🔸 Summary of Heartbeat Requirements

🔸 Failure Matrix for Heartbeat Scenarios

🔧 Understanding Voting Disk and Its Role in Oracle RAC Clusterware

🧠 What is Stored in the Voting Disk?

🔁 Why Voting Disks Are Crucial

💥 Split Brain Syndrome

❌ Risk:

🧱 I/O Fencing

⚖️ Simple Majority Rule

📌 Formula:

🔍 Examples – Voting Disk in Action

✅ Example 1: Network Heartbeat Failure in 3-node Cluster

✅ Example 2: Disk Heartbeat Split in 2-node Cluster

📌 Summary

No comments:

Post a Comment

Oracle Exadata

Labels