Disclaimer

Friday 1 October 2021

Oracle Clusterware changes to OCR and Voting file "Backup location"

Brief history of OCR and Voting Disks location

Oracle Clusterware uses OCR (Oracle Clusterware Registry) to store resource metadata and Voting files to prevent Split Brains. During Oracle 10g/11g time frame, these files were stored on raw devices and the recommendation was to maintain two copies of OCR and at least three Voting files. Administrators had to manually initiate backup of these files periodically.


Oracle11gR2 and higher releases simplified OCR and Voting file management by storing the OCR and Voting files in ASM (Automatic Storage Management). ASM automatically maintains the number of OCR/Voting disks based on the underlying Diskgroup redundancy further reducing manual DBA file management tasks. Additionally the Clusterware stack also initiated periodic automatic backups of these files.


Backup logic prior to Oracle 12c Release 2

Oracle Clusterware process CRSD backs up the OCR every four hours automatically into the local file system in the "cdata" directory under the Clusterware home. Voting files are included in the OCR backup since they are backed into the OCR. The policy automatically maintained hourly, daily and weekly backups of the OCR providing different time lines of the backup to be used as necessary.




ocrconfig –showbackup lists the location of the backups along with the node name




Backups are taken on the node running the CRSD Master. In a cluster, the CRSD master can found by searching for "OCR MASTER" in the crsd log as shown in the example.


Changes in Oracle 12c Release 2

The location of the files remain in ASM but the location of the backup of these files are no longer kept on local file system, rather they are backed up into an ASM Diskgroup as shown.




Reason for Change

The reason for the backup location to be moved to Shared storage is that the local file system may be inaccessible especially after catastrophic master node failure. Storing backups on the shared storage makes it accessible from any surviving node(s) of the cluster for recovery.


Steps to recover from loss of OCR

Oracle Clusterware is responsible for the life cycle management activities of processes and resources. This includes starting, stopping resources in a predetermined order along with periodic health checks of the resource. It relies on the OCR to store the relationships between resources and processes. This relationship plays a major role during stack startup.





As shown in the image, there is a co-relation between Voting Disk, OCR and ASM Diskgroup such that during startup, CSS process reads the voting file and signals the CRSD to start. CRSD reads the OCR and performs resource startup activities including the ASM instance and ASM Diskgroups.


This defined relationship between resources ensures that they are started,stopped in a predetermined order but any access issues to the OCR can prevent a successful startup of the stack. For example, ASM Diskgroup cannot be mounted if the OCR is lost because CSS process won't start and therefore it would not signal the CRSD process to start. But in order to restore the OCR, the ASM Diskgroup needs to be mounted. So how do we solve this cross-dependency to restore OCR?


In such cases, the new Oracle 12c Release 2 command "crsctl start crs" with the -excl option

#crsctl start crs -excl

can be used to start the Clusterware stack in maintenance mode from any node of the cluster. This -excl option temporarily disables the ASM-CRS-CSS dependency checks and starts the stack in exclusive mode for recovery operations to be performed. Note that the "-excl" option should only be executed from one node only. In fact, the command will fail if it finds that the CSS daemon running on other nodes.


Once the stack is started with the exclusive option, the ASM Diskgroups will be mounted and operations to restore OCR to the ASM Diskgroups can be performed. Detailed step by step instructions on the recovery procedure is documented in the Oracle Clusterware Administrators guide here


Conclusion

Oracle 12c Release 2 changes the backup location from local Storage to shared Storage as shared Storage is better suited for backups and more importantly any surviving node of the cluster can be used to initiate recovery from shared Storage. Since OCR and its backup is now maintained on shared Storage, it is important to ensure the following


The OCR, Voting file and the backups are configured on different Diskgroups

Adequate redundancy is defined for the ASM Diskgroup containing the OCR and Voting files



No comments:

Post a Comment

100 Oracle DBA Interview Questions and Answers

  Here are 100 tricky interview questions tailored for a Senior Oracle DBA role. These questions span a wide range of topics, including perf...