Amit's Oracle DBA Blog: OCR recovery

Monday, 13 September 2021

OCR recovery - RAC

OCR is beyond a doubt one of the most critical components of Oracle Clusterware, and its uninterrupted availability is necessary to the cluster resources function.

Keeping in mind this criticality, Oracle offers several options to protect the OCR file from physical or logical corruptions, unintentional human errors, and single points of failure.

The OCR file is automatically backed up every 4 hours by Oracle Clusterware and can also be backed up manually on demand.

To avoid a single point of failure, consider multiplexing the file up to a maximum of five copies.

You really need to understand and be aware of all possible methods to protect and recover OCR from different failures.

In this section, we shall highlight various OCR recovery scenarios.

Let's verify the existing OCR details.

For that, you need to log in as root and execute the following to view OCR details:

# ocrcheck

Logical corruption verification will be skipped if the preceding command is executed as a nonprivileged user.

To view the OCR file name and path, use the following command:

# ocrcheck -config

Automatic/manual backup details are listed using the following command:

# ocrconfig -showbackup

The following are a few OCR restore procedures that can be used in the event of all OCR files being either corrupted or lost.

Scenario 1: The following demonstrates a procedure to restore OCR from autogenerated OCR backups:

1) As the root user, get the auto backup availability details using the following command:

# ocrconfig -showbackup

2) Stop Clusterware on all nodes using the following command as root user:

# crsctl stop crs [-f]

-- Use the -f option to stop the CRS forcefully in the event that the crs stack couldn't be stopped normally due to various errors

3) As root user, restore the most recently valid backup copy identified in the preceding step.

Use the following restore example:

# ocrconfig -restore backup02.ocr

4) Upon restore completion, as root user, bring up the cluster stack and verify the OCR details subsequently on all nodes using the following commands:

# crsctl start crs

# cluvfy comp ocr -n all -verbose

Scenario 2: The following demonstrates a procedure to recover OCR in an ASM diskgroup, assuming that the ASM diskgroup got corrupted or couldn't be mounted:

1) After locating the most recent valid automatic or manual OCR backup, stop the cluster on all nodes.

To stop the cluster, use the command mentioned in the previous procedure.

Start the clusterware in exclusive mode using the following command as root user:

# crsctl start crs -excl -nocrs

2) Connect to the local ASM instance on the node and recreate the ASM diskgroup.

Use the following SQL examples to drop and re-create the diskgroup with the same name:

SQL> drop diskgroup DG_OCR force including contents;

SQL> create diskgroup DG_OCR external redundancy disk 'diskname' attribute 'COMPATIBLE.asm'='12.1.0';

3) Upon ASM disk re-creation, restore the most recent valid OCR backup.

Use the example explained in the previous scenario.

If the voting disk exists in the same disk group, you also need to re-create the voting disk.

Use the following example:

# crsctl replace votedisk +DG_OCR

After successfully completing the preceding steps, shut down the cluster on the local node and start the clusterware on all nodes subsequently.

Scenario 3: In the following example, we demonstrate the procedure to rebuild an OCR file in the event of all files becoming corrupted and when there is no valid backup available.

1) To be able to rebuild the OCR, you need to first unconfigure and then reconfigure the cluster.

However, noncluster resource information, such as database, listener, services, instance, etc. needs to be manually added to the OCR.

Therefore, it is important to collect the resource information using the crsctl, srvctl, oifcfg, etc., utilities.

2) To unconfigure the cluster, run the following example as root user across all nodes in sequence:

#$GI_HOME/crs/install/rootcrs.pl -deconfig -force -verbose

3) Upon successfully executing the preceding on all nodes, the following needs to be run on the first node of the cluster:

#$GI_HOME/crs/install/roortcrs.pl -deconfig -force -verbose -lastnode

4) After unconfiguring the cluster, you will now have to configure the cluster as root user with the following example:

#$GI_HOME/crs/config/config.sh

The config.sh invokes Grid Infrastructure configuration framework in Graphical User Interface (GUI) mode, and you need to provide the appropriate input through pages that displayed.

5) Finally, you will have to run the root.sh to complete the configuration.

After recovering the OCR file from various failure scenarios, run through the following set of postrecovery steps to verify the OCR file integrity and cluster availability on all nodes:

# ocrcheck

# cluvfy comp ocr-n all -verbose

# crsctl check cluster -all

Summary: In a nutshell, this chapter summarized and offered the most useful tools, tips, and techniques to design and deploy optimal database backup and recovery procedures in a RAC environment using the RMAN tool. In addition, the internal mechanics behind instance and crash recovery operations in a RAC environment were explained in great detail. The chapter concluded by discussing and demonstrating various OCR recovery scenarios.

Amit's Oracle DBA Blog

Disclaimer

Monday, 13 September 2021

OCR recovery - RAC

No comments:

Post a Comment

Oracle Exadata

Labels