The scenario below will corrupt the header of oracleasm disks that belong to disk group that has an OCR file, a voting disk, and ASM Password file.
Then we will show the steps and commands to recover from losing those files by
· Restoring the OCR file from the latest automatic backup
· Adding a new voting disk
· Creating a new ASM password file and adding key users (SYS, ASMSNMP, CRSUSER__ASM_001) to password file
Unfortunately, there is NO other option to remove OCR file.
We can’t manually delete OCR file from ASM, because the file is being accessed.
For example:
ASMCMD> rm -rf +OCR/dbp-cluster/OCRFILE/REGISTRY.255.966255313
ORA-15032: not all alterations performed
ORA-15028: ASM file '+OCR/rac-cluster/OCRFILE/REGISTRY.255.966255313' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)
Also, when cluster or CRS is down, we are not able to get to the OCR file as it is not available.
For example:
[oracle@rac1 ~]$ asmcmd
Connected to an idle instance.
ASMCMD> ls
ASMCMD-8102: no connection to Oracle ASM; command requires Oracle ASM to run
Assumptions:-
OCR disk group name is +OCR.
Database datafiles, controlfile, archives are located on another ASM disk groups.
Only database OCR log files are located on +OCR.
ASM parameter file and password file are located on +OCR.
Starting with 19c, OCR file automatic backups are located on ASM. In our example +OCR.
Scenario preparation
Step# Description
1 Take a backup for ASM SPFILE file.
— grid account
SQL> create pfile=’/tmp/init+ASM.ora’ from spfile;
2 Take a backup for ASM metadata.
— grid account
ASMCMD> md_backup /tmp/metadata.bkp -G data,ocr,fra
ASMCMD> md_restore /tmp/metadata.bkp -S /tmp/metadata.sql
3 Take a backup for the last OCR automatic backup.
— grid account
$ ocrconfig -showbackup
ASMCMD> cd OCR/dbp-cluster/OCRBACKUP
ASMCMD> cp backup_20240707_133509.ocr.268.1173706509 /tmp/ocr.bkp
4 Run below commands to show OCR & voting disk info and save the output as a reference.
— grid account
$ ocrcheck
$ crsctl query css votedisk
5 Query oracleasm disk info for OCR disks and save the output as a reference to be used
later when we recreate the disks again.
— root account
$ oracleasm querydisk -p OCR1
$ oracleasm querydisk -p OCR2
6 This is an optional as long as databases files are stored on another disk groups – Take a backup for your databases using RMAN and store backup on file system.
Corrupt OCR disks header and cleanup all oracle processes
Step# Description
1 Corrupt OCR disks header using DD command from one node only.
— root account
$ dd if=/dev/zero of=/dev/oracleasm/disks/OCR1 bs=8192
$ dd if=/dev/zero of=/dev/oracleasm/disks/OCR2 bs=8192
2 Most of CRS resources will crash and servers may reboot. We need to cleanup oracle processes on all cluster nodes.
— root account
$ crsctl stop crs -f
$ ps -ef | grep d.bin -> kill -9 <PID>
$ ps -ef | grep oracle -> kill -9 <PID>
$ ps -ef | grep grid -> kill -9 <PID>
3 Cleanup oracleasm disks on all cluster nodes.
— root account
$ oracleasm scandisks
$ oracleasm listdisks
Recover from losing OCR disk group
Step# Description
1 Create oracleasm disks for OCR disk group. Run from one node only then scan disks on another nodes.
— root account
$ oracleasm createdisk OCR1 /dev/sdf1
$ oracleasm createdisk OCR2 /dev/sdd1
$ oracleasm scandisks $ oracleasm listdisks
— On remaining nodes as root account
$ oracleasm scandisks
$ oracleasm listdisks
2 Start ASM instance on one node only in exclusive mode.
— root account
$ crsctl start crs -excl -nocrs
3 Check CRSD is down and stop it if it is not (STATE=OFFLINE).
— root account
$ crsctl status resource ora.crsd -init
4 Query voting disks and drop them if they exist.
— root account
$ crsctl query css votedisk
$ crsctl delete css votedisk +OCR
5 Connect to ASM instance and drop OCR disk group if it still exists.
— grid account
SQL> select name from v$asm_diskgroup where name=’ OCR’;
SQL> drop diskgroup OCR force including contents;
6 Connect to ASM instance and set ASM parameter asm_diskstring.
— grid account
SQL> alter system set asm_diskstring=’/dev/oracleasm/disks/*’;
7 Create OCR disk group using the metadata backup script we created earlier.
Copy SQL commands, which are related to OCR disk group from backup script, connect to ASM instance, then run the commands.
— grid account
$ cat /tmp/metadata.sql | grep OCR > /tmp/OCR_metadata.sql
SQL>@/tmp/OCR_metadata.sql
8 Restore the OCR from backup done before.
— root account
$ ocrconfig -restore /tmp/ocr.bkp
$ ocrcheck
9 Add voting disks.
— root account
$ crsctl replace votedisk +OCR
$ crsctl query css votedisk
10 Connect to ASM instance and create ASM SPFILE
— grid account
SQL> create spfile=’+OCR’ from pfile=’/tmp/init+ASM.ora’;
ASMCMD> cd OCR/dbp-cluster/ASMPARAMETERFILE
ASMCMD> ls -lt ASMCMD>spset+OCR/dbpcluster/ASMPARAMETERFILE/
registry.253.966422601
ASMCMD> spget
11 CRS is required to be running in non-exclusive mode to recreate the ASM password file in the disk group. This means some ‘application’ cluster resources will be started and available now.
— root account
$ crsctl stop crs -f
$ crsctl start crs -wait
12 Create ASM password file on OCR disk group.
— grid account
12.1. Create password file
ASMCMD> pwget --asm
ASMCMD> pwdelete --asm
ASMCMD> pwcreate --asm +OCR/orapwASM -f
ASMCMD> pwget --asm
12.2. Grant sysasm to SYS
ASMCMD> orapwusr --grant sysasm SYS
ASMCMD> lspwusr
12.3. Add ASMSNMP user and grant sysasm & sysdba
ASMCMD> orapwusr --add ASMSNMP -->enter the same password used to create password file
ASMCMD> orapwusr --grant sysdba ASMSNMP
ASMCMD> orapwusr --grant sysasm ASMSNMP
ASMCMD> lspwusr
12.4. Add user CRSUSER__ASM_001 to ASM password file
12.4.1. Dump OCR file and search dump file for “CRSUSER__ASM_001” to get user
“ORATEXT”
$ $GRID_HOME/bin/ocrdump /tmp/OCR.dmp
$ vi /tmp/OCR.dmp
For example:
[SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001] ORATEXT :
a332f4dbe3da5f4bbf3a71895a466319:grid
12.4.2. Get user “CRSUSER__ASM_001” password
$ /u01/app/grid/product/19c/grid/bin/crsctl get credmaint -path
/ASM/Self/a332f4dbe3da5f4bbf3a71895a466319 -credtype userpass -id 0 -attr passwd -local
eZX8GNed1KfM3E92U57nSziL4HO5E
12.4.3. Add user CRSUSER__ASM_001 to password file
ASMCMD> orapwusr --add CRSUSER__ASM_001 --> Use password you got it from previous command
ASMCMD> orapwusr --grant sysdba CRSUSER__ASM_001
ASMCMD> orapwusr --grant sysasm CRSUSER__ASM_001
ASMCMD> lspwusr
13 Stop and start CRS normally. All resources should start normally.
— root account
$ crsctl stop crs -f
$ crsctl start crs -wait
$ crctl check cluster
$ crsctl status res -t
14 Start CRS on the remaining cluster node(s). All resources should start normally.
— root account
$ crsctl start crs -wait
$ crctl check cluster
$ crsctl status res -t
15 Run a cluster verification of all cluster nodes. Run from one node only.
— grid account
$ cd $ORACLE_HOME/bin
$ cluvfy comp ocr -n all -verbose
16 Check the status of cluster resources. Resources should be ONLINE on all nodes.
— grid account
$ crsctl status res -t
$ srvctl status asm
$ srvctl status scan
$ srvctl status scan_listener
$ srvctl status listener
17 Stop database(s) then startup mount to recreate OCR log groups on OCR disk group.
— oracle account
$ srvctl stop database -d <DB_NAME>
SQL> startup mount
SQL> select * from v$log;
SQL> select member from v$logfile;
SQL> alter database drop logfile group &id; Ã repeat for all OCR log groups have
member(s) on OCR
SQL> alter database add logfile thread &THD_NUM group &GRP_NUM (‘+OCR’) size
<SIZE_KB>;
SQL> select * from v$log;
SQL> select member from v$logfile;
18 Stop database(s) then startup.
— oracle account
SQL> shutdown immediate
$ srvctl start database -d <DB_NAME>
SQL> select name,database_role,open_mode from v$database;
SQL> select instance_name,host_name,status from gv$instance;
No comments:
Post a Comment