Disclaimer

Monday 13 September 2021

Did you ever lost your Grid Infrastructure Diskgroup?

 


As you know, Oracle Grid Infrastructure needs several internal stuff to run. There are voting disks, the cluster registry (OCR) and since 12c there is a Grid Infrastructure Management Repository (GIMR). All of these are stored together in an ASM diskgroup, at least if you did not separated things later on.
But what happens if this diskgroup get lost? The cluster stack will not come up since some very fundamental ressources are missing now. I tested this case a couple of years ago when 11.2 came out and all the cluster stuff was moved into an ASM diskgroup. And I needed to test again with 12c because my freshly installed VM lost it’s shared disk that I used for the OCR diskgroup. So I was forced to recover my installation.
Be careful with the OS users you use to do things. Some steps need root privileges, other steps are executed as the Grid Infrastructure owner (oracle in my case). I included the prompt which reflects the user I used to perform the particular step.

1. Status

The cluster stack does not come up since there are no voting disks anymore.
Cluster alert.log shows:

1
2015-05-04 10:32:48.255 [OCSSD(31416)]CRS-1714: Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/oracle/diag/crs/oel6u4/crs/trace/ocssd.trc

And the internal processes look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[root@oel6u4 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                               STABLE
ora.crf
      1        ONLINE  OFFLINE                               STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  OFFLINE      oel6u4                   STARTING
ora.cssdmonitor
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE oel6u4                   STABLE
ora.gipcd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.gpnpd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.mdnsd
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.storage
      1        ONLINE  OFFLINE                               STABLE
--------------------------------------------------------------------------------

2. Recreate OCR diskgroup

At first I made my disk available again using ASMlib:

1
2
3
4
5
6
7
8
[root@oel6u4 ~]# oracleasm listdisks
DATA
[root@oel6u4 ~]# oracleasm createdisk OCR /dev/sdc1
Writing disk header: done
Instantiating disk: done
[root@oel6u4 ~]# oracleasm listdisks
DATA
OCR

Now I need to restore my ASM diskgroup, but I that requires a running ASM instance to do that. So stop the cluster stack and start again in exclusive mode. By the way, “crsctl stop crs -f” did not finish so I disabled the cluster stack by issuing “crsctl disable has” and reboot.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
[root@oel6u4 ~]# crsctl enable has
CRS-4622: Oracle High Availability Services autostart is enabled.
[root@oel6u4 ~]# crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'oel6u4'
CRS-2672: Attempting to start 'ora.mdnsd' on 'oel6u4'
CRS-2676: Start of 'ora.evmd' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'oel6u4'
CRS-2676: Start of 'ora.gpnpd' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'oel6u4'
CRS-2672: Attempting to start 'ora.gipcd' on 'oel6u4'
CRS-2676: Start of 'ora.cssdmonitor' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.gipcd' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'oel6u4'
CRS-2672: Attempting to start 'ora.diskmon' on 'oel6u4'
CRS-2676: Start of 'ora.diskmon' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.cssd' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'oel6u4'
CRS-2672: Attempting to start 'ora.ctssd' on 'oel6u4'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'oel6u4'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'oel6u4'
CRS-2676: Start of 'ora.crf' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.ctssd' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'oel6u4'
CRS-2676: Start of 'ora.asm' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'oel6u4'
diskgroup OCR not mounted ()
CRS-5017: The resource action "ora.storage start" encountered the following error:
Storage agent start action aborted. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/oel6u4/crs/trace/ohasd_orarootagent_root.trc".
CRS-2674: Start of 'ora.storage' on 'oel6u4' failed
CRS-2679: Attempting to clean 'ora.storage' on 'oel6u4'
CRS-2681: Clean of 'ora.storage' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'oel6u4'
CRS-2677: Stop of 'ora.asm' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'oel6u4'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'oel6u4'
CRS-2677: Stop of 'ora.drivers.acfs' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'oel6u4'
CRS-2677: Stop of 'ora.ctssd' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'oel6u4'
CRS-2677: Stop of 'ora.crf' on 'oel6u4' succeeded
CRS-4000: Command Start failed, or completed with errors.

as you see the startup fails since “ora.storage” is not able to locate the OCR diskgroup. That means there is a timeframe of about 10 minutes to create the diskgroup during startup of “ora.storage”.

If I would have made a backup of my ASM diskgroup I could have used that. But I have not. That’s why I create my OCR diskgroup from scratch. Start the CRS again and then do the following from a second session:

1
2
3
4
5
6
7
8
9
10
11
[root@oel6u4 ~]# cat ocr.dg
<dg name="ocr" redundancy="external">
  <dsk string="/dev/oracleasm/disks/OCR" quorum="QUORUM" />
  <a name="compatible.asm" value="12.1.0.2.0" />
  <a name="compatible.rdbms" value="12.1.0.2.0" />
</dg>
 
[root@oel6u4 ~]# asmcmd mkdg ~/ocr.dg
[root@oel6u4 ~]# asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576     12287    12234                0           12234              0             N  OCR/

So far, so good. The OCR diskgroup is there and it is mounted. At this point the “ora.storage” manages to come up successfully.

3. Restore OCR

Next step is restoring the OCR from backup. Fortunately the clusterware creates backups of the OCR by itself right from the beginning.

1
2
3
4
5
6
7
8
9
10
11
[root@oel6u4 ~]# ocrconfig -showbackup
PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy
 
oel6u4     2015/05/02 18:33:41     /u01/app/grid/12.1.0.2/cdata/mycluster/backup00.ocr     0
 
oel6u4     2015/05/02 14:33:17     /u01/app/grid/12.1.0.2/cdata/mycluster/backup01.ocr     0
 
oel6u4     2015/05/02 14:33:17     /u01/app/grid/12.1.0.2/cdata/mycluster/day.ocr     0
 
oel6u4     2015/05/02 14:33:17     /u01/app/grid/12.1.0.2/cdata/mycluster/week.ocr     0
PROT-25: Manual backups for the Oracle Cluster Registry are not available

Just choose the most recent backup and use it to restore the contents of OCR.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@oel6u4 ~]# ocrconfig -restore /u01/app/grid/12.1.0.2/cdata/mycluster/backup00.ocr
[root@oel6u4 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1348
         Available space (kbytes) :     408220
         ID                       :  768712202
         Device/File Name         :       +OCR
                                    Device/File integrity check succeeded
 
                                    Device/File not configured
 
                                    Device/File not configured
 
                                    Device/File not configured
 
                                    Device/File not configured
 
         Cluster registry integrity check succeeded
 
         Logical corruption check succeeded

4. Restore Voting Disk

Since the voting files are placed in ASM together with OCR, the OCR backup contains a copy of the voting file as well. All I need to do is start CRSD and recreate the voting file.

1
2
3
4
5
6
7
[root@oel6u4 ~]# crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'oel6u4'
CRS-2676: Start of 'ora.crsd' on 'oel6u4' succeeded
[root@oel6u4 ~]# crsctl replace votedisk +OCR
CRS-4602: Failed 27 to add voting file d1c46046fd004f1abf98e3beb7905baa.
Failed to replace voting disk group with +OCR.
CRS-4000: Command Replace failed, or completed with errors.

Not good. But the reason for that is that ASM does not have ASM_DISKSTRING configured. Actually ASM has not a single parameter configured because it is using a spfile stored in OCR diskgroup as well. That means there is no spfile anymore. As a quick solution I set the parameter in memory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[oracle@oel6u4 ~]$ sqlplus / as sysasm
 
SQL*Plus: Release 12.1.0.2.0 Production on Mon May 4 11:20:31 2015
 
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
 
SQL> show parameter diskstr
 
NAME                                 TYPE
------------------------------------ ---------------------------------
VALUE
------------------------------
asm_diskstring                       string
 
SQL> alter system set asm_diskstring='/dev/oracleasm/disks/*' scope=memory;
 
System altered.

With this small change I am now able to recreate the voting file.

1
2
3
4
5
6
7
8
9
[root@oel6u4 ~]# crsctl replace votedisk +OCR
Successful addition of voting disk c0cb172eb1d34f9abf04b37c883c9ddd.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced
[root@oel6u4 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   c0cb172eb1d34f9abf04b37c883c9ddd (/dev/oracleasm/disks/OCR) [OCR]
Located 1 voting disk(s).

5. Restore ASM spfile

This is easy, I don’t have a backup of my ASM spfile so I recreate it from memory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[oracle@oel6u4 ~]$ sqlplus / as sysasm
 
SQL*Plus: Release 12.1.0.2.0 Production on Mon May 4 11:27:16 2015
 
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
 
SQL> create spfile='+OCR' from memory;
 
File created.

The GPNP profile get’s updated also by doing so.

6. Restart Grid Infrastructure

I have everything restored that I need to start the clusterware in normal operation mode. Let’s see:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@oel6u4 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'oel6u4'
CRS-2673: Attempting to stop 'ora.crsd' on 'oel6u4'
CRS-2677: Stop of 'ora.crsd' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'oel6u4'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'oel6u4'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'oel6u4'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'oel6u4'
CRS-2677: Stop of 'ora.drivers.acfs' on 'oel6u4' succeeded
CRS-2677: Stop of 'ora.evmd' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'oel6u4'
CRS-2673: Attempting to stop 'ora.ctssd' on 'oel6u4'
CRS-2673: Attempting to stop 'ora.storage' on 'oel6u4'
CRS-2677: Stop of 'ora.mdnsd' on 'oel6u4' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'oel6u4' succeeded
CRS-2677: Stop of 'ora.storage' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'oel6u4'
CRS-2677: Stop of 'ora.crf' on 'oel6u4' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'oel6u4' succeeded
CRS-2677: Stop of 'ora.asm' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'oel6u4'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'oel6u4'
CRS-2677: Stop of 'ora.cssd' on 'oel6u4' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'oel6u4'
CRS-2677: Stop of 'ora.gipcd' on 'oel6u4' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'oel6u4' has completed
CRS-4133: Oracle High Availability Services has been stopped.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
[root@oel6u4 ~]# crsctl start has -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'oel6u4'
CRS-2672: Attempting to start 'ora.evmd' on 'oel6u4'
CRS-2676: Start of 'ora.evmd' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'oel6u4'
CRS-2676: Start of 'ora.gpnpd' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'oel6u4'
CRS-2676: Start of 'ora.gipcd' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'oel6u4'
CRS-2676: Start of 'ora.cssdmonitor' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'oel6u4'
CRS-2672: Attempting to start 'ora.diskmon' on 'oel6u4'
CRS-2676: Start of 'ora.diskmon' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.cssd' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'oel6u4'
CRS-2672: Attempting to start 'ora.ctssd' on 'oel6u4'
CRS-2676: Start of 'ora.ctssd' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'oel6u4'
CRS-2676: Start of 'ora.asm' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'oel6u4'
CRS-2676: Start of 'ora.storage' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'oel6u4'
CRS-2676: Start of 'ora.crf' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'oel6u4'
CRS-2676: Start of 'ora.crsd' on 'oel6u4' succeeded
CRS-6023: Starting Oracle Cluster Ready Services-managed resources
CRS-2664: Resource 'ora.OCR.dg' is already running on 'oel6u4'
CRS-6017: Processing resource auto-start for servers: oel6u4
CRS-2672: Attempting to start 'ora.net1.network' on 'oel6u4'
CRS-2672: Attempting to start 'ora.MGMTLSNR' on 'oel6u4'
CRS-2672: Attempting to start 'ora.proxy_advm' on 'oel6u4'
CRS-2672: Attempting to start 'ora.oc4j' on 'oel6u4'
CRS-2676: Start of 'ora.net1.network' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.cvu' on 'oel6u4'
CRS-2672: Attempting to start 'ora.oel6u4.vip' on 'oel6u4'
CRS-2672: Attempting to start 'ora.ons' on 'oel6u4'
CRS-2672: Attempting to start 'ora.scan1.vip' on 'oel6u4'
CRS-2676: Start of 'ora.cvu' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.oel6u4.vip' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'oel6u4'
CRS-2676: Start of 'ora.scan1.vip' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'oel6u4'
CRS-2676: Start of 'ora.ons' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.MGMTLSNR' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.mgmtdb' on 'oel6u4'
CRS-2676: Start of 'ora.proxy_advm' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.DATA.dg' on 'oel6u4'
CRS-5017: The resource action "ora.mgmtdb start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+OCR/_mgmtdb/spfile-MGMTDB.ora'
ORA-17503: ksfdopn:2 Failed to open file +OCR/_mgmtdb/spfile-MGMTDB.ora
ORA-15056: additional error message
ORA-17503: ksfdopn:2 Failed to open file +OCR/_mgmtdb/spfile-mgmtdb.ora
ORA-15173: entry '_mgmtdb' does not exist in directory '/'
ORA-06512: at line 4
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/oel6u4/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.mgmtdb' on 'oel6u4' failed
CRS-2679: Attempting to clean 'ora.mgmtdb' on 'oel6u4'
CRS-2681: Clean of 'ora.mgmtdb' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.DATA.dg' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.DATA.DATA.advm' on 'oel6u4'
CRS-2676: Start of 'ora.oc4j' on 'oel6u4' succeeded
CRS-2676: Start of 'ora.DATA.DATA.advm' on 'oel6u4' succeeded
CRS-2672: Attempting to start 'ora.data.data.acfs' on 'oel6u4'
CRS-2676: Start of 'ora.data.data.acfs' on 'oel6u4' succeeded
===== Summary of resource auto-start failures follows =====
CRS-2807: Resource 'ora.mgmtdb' failed to start automatically.
CRS-6016: Resource auto-start has completed for server oel6u4
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.

You see, the GIMR (MGMTDB) is gone too. I will talk about that soon. At first, let’s see if all the other ressources are running properly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
[root@oel6u4 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       oel6u4                   STABLE
ora.DATA.DATA.advm
               ONLINE  ONLINE       oel6u4                   Volume device /dev/a
                                                             sm/data-347 is onlin
                                                             e,STABLE
ora.DATA.dg
               ONLINE  ONLINE       oel6u4                   STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       oel6u4                   STABLE
ora.OCR.dg
               ONLINE  ONLINE       oel6u4                   STABLE
ora.data.data.acfs
               ONLINE  ONLINE       oel6u4                   mounted on /u02,STAB
                                                             LE
ora.net1.network
               ONLINE  ONLINE       oel6u4                   STABLE
ora.ons
               ONLINE  ONLINE       oel6u4                   STABLE
ora.proxy_advm
               ONLINE  ONLINE       oel6u4                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       oel6u4                   169.254.39.205 192.1
                                                             68.1.1,STABLE
ora.asm
      1        ONLINE  ONLINE       oel6u4                   STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.mgmtdb
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.oc4j
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.oel6u4.vip
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       oel6u4                   STABLE
--------------------------------------------------------------------------------

Loooks good so far ðŸ™‚

7. Restore ASM password file

Since 12c the password file for ASM is stored inside ASM. Again, I have no backup so I need to create it from scratch.

1
2
3
4
5
[oracle@oel6u4 ~]$ orapwd file=/tmp/orapwasm password=Oracle-1 force=y
[oracle@oel6u4 ~]$ asmcmd pwcopy --asm /tmp/orapwasm +OCR/pwdasm
copying /tmp/orapwasm -> +OCR/pwdasm
[oracle@oel6u4 ~]$ asmcmd pwget --asm
+OCR/pwdasm

the “pwcopy” updates the GPNP profile to reflect this.

8. Restore GIMR

There is no way to backup the GIMR. You have to recreate it. The Cluster Health Monitor (CHM) must not run during this recreation, it has to be stopped and disabled. Then I need to remove the GIMR cluster ressource.

1
2
3
4
[root@oel6u4 ~]# crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'oel6u4'
CRS-2677: Stop of 'ora.crf' on 'oel6u4' succeeded
[root@oel6u4 ~]# crsctl modify res ora.crf -attr ENABLED=0 -init
1
2
[oracle@oel6u4 ~]$ srvctl remove mgmtdb
Remove the database _mgmtdb? (y/[n]) y

Then use dbca from the Grid Infrastructure home to create GIMR. First the container database:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[oracle@oel6u4 ~]$ dbca -silent -createDatabase -sid -MGMTDB -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc -gdbName _mgmtdb -storageType ASM -diskGroupName +OCR -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck
Registering database with Oracle Grid Infrastructure
5% complete
Copying database files
7% complete
9% complete
16% complete
23% complete
30% complete
37% complete
41% complete
Creating and starting Oracle instance
43% complete
48% complete
49% complete
50% complete
55% complete
60% complete
61% complete
64% complete
Completing Database Creation
68% complete
79% complete
89% complete
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb2.log" for further details.

Second, the pluggable database:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[oracle@oel6u4 ~]$ dbca -silent -createPluggableDatabase -sourceDB -MGMTDB -pdbName mycluster -createPDBFrom RMANBACKUP -PDBBackUpfile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.dfb -PDBMetadataFile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.xml -createAsClone true -internalSkipGIHomeCheck
Creating Pluggable Database
4% complete
12% complete
21% complete
38% complete
55% complete
85% complete
Completing Pluggable Database Creation
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/mycluster/_mgmtdb1.log" for further details.
[oracle@oel6u4 ~]$ srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node oel6u4
1
[oracle@oel6u4 ~]$ mgmtca
1
2
3
4
[root@oel6u4 ~]# crsctl modify res ora.crf -attr ENABLED=1 -init
[root@oel6u4 ~]# crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'oel6u4'
CRS-2676: Start of 'ora.crf' on 'oel6u4' succeeded

Let’s see if everything is running fine again:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
[oracle@oel6u4 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       oel6u4                   STABLE
ora.DATA.DATA.advm
               ONLINE  ONLINE       oel6u4                   Volume device /dev/a
                                                             sm/data-347 is onlin
                                                             e,STABLE
ora.DATA.dg
               ONLINE  ONLINE       oel6u4                   STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       oel6u4                   STABLE
ora.OCR.dg
               ONLINE  ONLINE       oel6u4                   STABLE
ora.data.data.acfs
               ONLINE  ONLINE       oel6u4                   mounted on /u02,STAB
                                                             LE
ora.net1.network
               ONLINE  ONLINE       oel6u4                   STABLE
ora.ons
               ONLINE  ONLINE       oel6u4                   STABLE
ora.proxy_advm
               ONLINE  ONLINE       oel6u4                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       oel6u4                   169.254.39.205 192.1
                                                             68.1.1,STABLE
ora.asm
      1        ONLINE  ONLINE       oel6u4                   STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       oel6u4                   Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.oel6u4.vip
      1        ONLINE  ONLINE       oel6u4                   STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       oel6u4                   STABLE
--------------------------------------------------------------------------------

Funny, my ASM now has 3 instances again, I already changed that to “count=all” but obviously the OCR backup was taken before that was done. And I had two databases in place, both ressources are missing for the same reason. But that’s not a major issue.

1
2
3
4
5
6
[oracle@oel6u4 ~]$ srvctl modify asm -count all
 
[oracle@oel6u4 ~]$ srvctl add database -db db11g -oraclehome /u01/app/oracle/product/11.2.0.4/db -dbtype RAC \
> -spfile /u02/app/oracle/oradata/db11g/spfiledb11g.ora -pwfile /u02/app/oracle/oradata/db11g/orapwdb11g \
> -serverpool oel6u4 -acfspath /u02
PRCR-1039 : Server pool ora.oel6u4 does not exist

No serverpool, obvious. Serverpools are also defined in OCR.

1
2
3
4
5
6
7
8
[oracle@oel6u4 ~]$ srvctl add srvpool -serverpool oel6u4 -min 1 -max 1 -category "hub"
 
[oracle@oel6u4 ~]$ srvctl add database -db cdb12c -oraclehome /u01/app/oracle/product/12.1.0.2/db -dbtype RAC  \
> -spfile /u02/app/oracle/oradata/cdb12c/spfilecdb12c.ora -pwfile /u02/app/oracle/oradata/cdb12c/orapwcdb12c \
> -serverpool oel6u4 -acfspath /u02
 
[oracle@oel6u4 ~]$ srvctl add database -d db11g -o /u01/app/oracle/product/11.2.0.4/db -c RAC \
> -p /u02/app/oracle/oradata/db11g/spfiledb11g.ora -g oel6u4 -j /u02

9. Lessons learned -or- What one should backup

Usually I do metadata backups of all the ASM diskgroups as well as backups of ASM spfile and password file and of cause do periodic backups of OCR and OLR. But it was nice to see, that it is possible to restore everything without any manual backups in place. What should one backup and how:

  • ASM metadata: asmcmd md_backup
  • ASM spfile: asmcmd spcopy -or- create pfile from spfile
  • ASM password file: asmcmd pwcopy
  • OCR: configure OCR backup to external storage or copy manually
  • OCR: do backups everytime you add/delete/modify cluster ressources
  • OLR: not mentioned here since it is stored on disk, but important too

10. References

My Oracle Support

No comments:

Post a Comment

100 Oracle DBA Interview Questions and Answers

  Here are 100 tricky interview questions tailored for a Senior Oracle DBA role. These questions span a wide range of topics, including perf...