Disclaimer

Sunday, 10 November 2024

How To Recover From Losing OCR, Voting Disk, Or ASM Password - Oracle RAC 19c

 

The scenario below will corrupt the header of oracleasm disks that belong to disk group that has an OCR file, a voting disk, and ASM Password file.

Then we will show the steps and commands to recover from losing those files by

·         Restoring the OCR file from the latest automatic backup

·         Adding a new voting disk

·         Creating a new ASM password file and adding key users (SYS, ASMSNMP, CRSUSER__ASM_001) to password file

Unfortunately, there is NO other option to remove OCR file. 

We can’t manually delete OCR file from ASM, because the file is being accessed.

For example:

ASMCMD> rm -rf +OCR/dbp-cluster/OCRFILE/REGISTRY.255.966255313
ORA-15032: not all alterations performed
ORA-15028: ASM file '+OCR/rac-cluster/OCRFILE/REGISTRY.255.966255313' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)

Also, when cluster or CRS is down, we are not able to get to the OCR file as it is not available.

For example:

[oracle@rac1 ~]$ asmcmd
Connected to an idle instance.
ASMCMD> ls
ASMCMD-8102: no connection to Oracle ASM; command requires Oracle ASM to run

Assumptions:-

OCR disk group name is +OCR.

Database datafiles, controlfile, archives are located on another ASM disk groups.

Only database OCR log files are located on +OCR.

ASM parameter file and password file are located on +OCR.

Starting with 19c, OCR file automatic backups are located on ASM. In our example +OCR.


Scenario preparation


Step# Description

1 Take a backup for ASM SPFILE file.

— grid account

SQL> create pfile=’/tmp/init+ASM.ora’ from spfile;


2 Take a backup for ASM metadata.

— grid account

ASMCMD> md_backup /tmp/metadata.bkp  -G  data,ocr,fra

ASMCMD> md_restore /tmp/metadata.bkp  -S /tmp/metadata.sql


3 Take a backup for the last OCR automatic backup.

— grid account

$ ocrconfig  -showbackup

ASMCMD> cd OCR/dbp-cluster/OCRBACKUP

ASMCMD> cp backup_20240707_133509.ocr.268.1173706509 /tmp/ocr.bkp


4 Run below commands to show OCR & voting disk info and save the output as a reference.

— grid account

$ ocrcheck

$ crsctl query css votedisk


5 Query oracleasm disk info for OCR disks and save the output as a reference to be used

later when we recreate the disks again.

— root account

$ oracleasm querydisk -p OCR1

$ oracleasm querydisk -p OCR2


6 This is an optional as long as databases files are stored on another disk groups – Take a backup for your databases using RMAN and store backup on file system.

Corrupt OCR disks header and cleanup all oracle processes




Step# Description

1 Corrupt OCR disks header using DD command from one node only.

— root account

$ dd if=/dev/zero of=/dev/oracleasm/disks/OCR1 bs=8192

$ dd if=/dev/zero of=/dev/oracleasm/disks/OCR2 bs=8192


2 Most of CRS resources will crash and servers may reboot. We need to cleanup oracle processes on all cluster nodes.

— root account

$ crsctl stop crs -f

$ ps -ef | grep d.bin   ->  kill -9 <PID>

$ ps -ef | grep oracle -> kill -9 <PID>

$ ps -ef | grep grid     -> kill -9 <PID>


3 Cleanup oracleasm disks on all cluster nodes.

— root account

$ oracleasm scandisks

$ oracleasm listdisks



Recover from losing OCR disk group


Step# Description

1 Create oracleasm disks for OCR disk group. Run from one node only then scan disks on another nodes.

— root account

$ oracleasm createdisk OCR1 /dev/sdf1

$ oracleasm createdisk OCR2 /dev/sdd1

$ oracleasm scandisks $ oracleasm listdisks

— On remaining nodes as root account

$ oracleasm scandisks

$ oracleasm listdisks


2 Start ASM instance on one node only in exclusive mode.

— root account

$ crsctl start crs -excl -nocrs


3 Check CRSD is down and stop it if it is not (STATE=OFFLINE).

— root account

$ crsctl status resource ora.crsd -init


4 Query voting disks and drop them if they exist.

— root account

$ crsctl query css votedisk

$ crsctl delete css votedisk +OCR


5 Connect to ASM instance and drop OCR disk group if it still exists.

— grid account

SQL> select name from v$asm_diskgroup where name=’ OCR’;

SQL> drop diskgroup OCR force including contents;


6 Connect to ASM instance and set ASM parameter asm_diskstring.

— grid account

SQL> alter system set asm_diskstring=’/dev/oracleasm/disks/*’;


7 Create OCR disk group using the metadata backup script we created earlier.

Copy SQL commands, which are related to OCR disk group from backup script, connect to ASM instance, then run the commands.

— grid account

$ cat /tmp/metadata.sql | grep OCR > /tmp/OCR_metadata.sql

SQL>@/tmp/OCR_metadata.sql


8 Restore the OCR from backup done before.

— root account

$ ocrconfig -restore /tmp/ocr.bkp

$ ocrcheck


9 Add voting disks.

— root account

$ crsctl replace votedisk  +OCR

$ crsctl query css votedisk


10 Connect to ASM instance and create ASM SPFILE

— grid account

SQL> create spfile=’+OCR’  from pfile=’/tmp/init+ASM.ora’;

ASMCMD> cd OCR/dbp-cluster/ASMPARAMETERFILE

ASMCMD> ls -lt ASMCMD>spset+OCR/dbpcluster/ASMPARAMETERFILE/

registry.253.966422601

ASMCMD> spget


11 CRS is required to be running in non-exclusive mode to recreate the ASM password file in the disk group.  This means some ‘application’ cluster resources will be started and available now.

— root account

$ crsctl stop crs -f

$ crsctl start crs -wait


12 Create ASM password file on OCR disk group.

— grid account

12.1. Create password file

ASMCMD> pwget  --asm

ASMCMD> pwdelete  --asm

ASMCMD> pwcreate  --asm +OCR/orapwASM  -f

ASMCMD> pwget  --asm

12.2. Grant sysasm to SYS

ASMCMD> orapwusr  --grant sysasm SYS

ASMCMD> lspwusr

12.3. Add ASMSNMP user and grant sysasm & sysdba

ASMCMD> orapwusr  --add ASMSNMP  -->enter the same password used to create password file

ASMCMD> orapwusr  --grant sysdba ASMSNMP

ASMCMD> orapwusr  --grant sysasm ASMSNMP

ASMCMD> lspwusr

12.4. Add user CRSUSER__ASM_001 to ASM password file

12.4.1. Dump OCR file and search dump file for “CRSUSER__ASM_001” to get user

“ORATEXT”

$ $GRID_HOME/bin/ocrdump /tmp/OCR.dmp

$ vi /tmp/OCR.dmp

For example:

[SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001] ORATEXT :

a332f4dbe3da5f4bbf3a71895a466319:grid

12.4.2. Get user “CRSUSER__ASM_001” password

$ /u01/app/grid/product/19c/grid/bin/crsctl get credmaint -path

/ASM/Self/a332f4dbe3da5f4bbf3a71895a466319 -credtype userpass -id 0 -attr passwd  -local

eZX8GNed1KfM3E92U57nSziL4HO5E

12.4.3. Add user CRSUSER__ASM_001 to password file

ASMCMD> orapwusr  --add CRSUSER__ASM_001  --> Use password you got it from previous command

ASMCMD> orapwusr  --grant sysdba CRSUSER__ASM_001

ASMCMD> orapwusr  --grant sysasm CRSUSER__ASM_001

ASMCMD> lspwusr


13 Stop and start CRS normally. All resources should start normally.

— root account

$ crsctl stop crs -f

$ crsctl start crs -wait

$ crctl check cluster

$ crsctl status res -t


14 Start CRS on the remaining cluster node(s). All resources should start normally.

— root account

$ crsctl start crs -wait

$ crctl check cluster

$ crsctl status res -t


15 Run a cluster verification of all cluster nodes. Run from one node only.

— grid account

$ cd $ORACLE_HOME/bin

$ cluvfy comp ocr -n all -verbose


16 Check the status of cluster resources. Resources should be ONLINE on all nodes.

— grid account

$ crsctl status res -t

$ srvctl status asm

$ srvctl status scan

$ srvctl status scan_listener

$ srvctl status listener



17 Stop database(s) then startup mount to recreate OCR log groups on OCR disk group.

— oracle account

$ srvctl stop database -d <DB_NAME>

SQL> startup mount

SQL> select * from v$log;

SQL> select member from v$logfile;

SQL> alter database drop logfile group &id; Ã  repeat for all OCR log groups have

member(s) on OCR

SQL> alter database add logfile thread &THD_NUM group &GRP_NUM (‘+OCR’) size

<SIZE_KB>;

SQL> select * from v$log;

SQL> select member from v$logfile;



18 Stop database(s) then startup.

— oracle account

SQL> shutdown immediate

$ srvctl start database -d <DB_NAME>

SQL> select name,database_role,open_mode from v$database;

SQL> select instance_name,host_name,status from gv$instance;




No comments:

Post a Comment

How to recovery PDB when PDB database is dropped in Oracle

  How to recovery PDB when PDB database is dropped :) [oracle@rac01 ~]$ sqlplus '/as sysdba' SQL*Plus: Release 21.0.0.0.0 - Product...