Disclaimer

Sunday 11 July 2021

How to Recover Diskgroup in Case of Transient Failures

 

Uses of  DISK_REPAIR_TIME and FAILGROUP_REPAIR_TIME Attribute

DISK_REPAIR_TIME:

DISK_REPAIR_TIME is an attribute for diskgroup and the default value is 3.6Hrs. If ASM unable to perform IO on any disk in the diskgroup it place taht disk offline and wait for time specified in DISK_REPAIR_TIME for the disks be back online. If the disk did not back online with in this time period Oracle ASM then drops that disk from the diskgroup. 

SQL> select name,value from v$asm_attribute where group_number=1 and  name like '%disk_repair_time%';
NAME                                     VALUE
---------------------------------------- ----------------------------
disk_repair_time                         3.6 h

Once you hit a failure and the disk is no more available, ASM marks this disk offline and time starts ticking until disk_repair_time value is reached. If the issue is not fixed and disk is still not available asm drops the disk from diskgroup. Once the disk is dropped rebalance operation is triggered, which may take longer to complete depending on many factors like power limit used, amount of data to rebalance etc..
and once the disk is available to server again you will add it back to the diskgroup and again rebalance operation will take place. This is definitely going to take longer time.

NOTE:
If you hit the issue and disk is dropped you can follow the instruction provided in How to Add Dropped Disks Back into Diskgroup to repair it.


If you know how long it may take in your environment to fix an issue like this you can avoid the disk drop adjusting the value of disk_repair_time attribute. So if disk_repair_time (3.6Hrs) not enough for your environment consider increasing it.

 Lets say its 72Hrs. In this case once the disk suffers failure ASM marks it offline and keeps tracking the changes on the disk. Say it took 50 Hrs to fix the disk issue. Disk is still marked offline but oracle ASM has all the changes of 50Hrs. Once  the disk is back you can simply bring it online and oracle has to just synchronize the changes resulting faster recovery of the disk and less administration effort.

How to Bring the Disk Online 

Once the disk is repaired you can put it back online using one of the following command.

ALTER DISKGROUP DATA ONLINE ALL;
ALTER DISKGROUP DATA ONLINE DISK DATA_001;
ALTER DISKGROUP DATA ONLINE DISKS IN FAILGROUP FG2;

NOTE: You can monitor the progress of online disk command in V$ASM_OPERATION view


SQL> SELECT GROUP_NUMBER, PASS, STATE FROM V$ASM_OPERATION;
 
GROUP_NUMBER PASS      STAT
------------ --------- ----
           1 RESYNC    RUN
           1 REBALANCE WAIT
           1 COMPACT   WAIT

    
NOTE:
An offline operation does not generate a display in a V$ASM_OPERATION view query because it does not cause any rebalance to occur.

The drop operation but triggers rebalance and again can be monitored from V$ASM_OPERATION view

You can read more about Oracle ASM fast Mirror Resync in post Oracle ASM Fast Mirror Resync

How to increase DISK_REPAIR_TIME Attribute Value

ALTER DISKGROUP DATA SET ATTRIBUTE 'disk_repair_time' = '72.5h';
ALTER DISKGROUP DATA SET ATTRIBUTE 'disk_repair_time' = '4350m';


NOTE: You can specify the disk_group_repair time value either in Hours or in Minutes. Also note that the modified value will only be applicable for new failures. The above modification will not affect already offline disks


FAILGROUP_REPAIR_TIME:-

FAILGROUP_REPAIR_TIME: This attribute specifies a default repair time for the failure groups in the disk group. Its used when entire failure group has failed.
for Example if all the disks in a failgroup is offline for example (Say complete storage from one of the datacenter is offline). The default value is 24 hours (24h)


If default value 24h is not sufficient for your environment, you can change it using alter diskgroup command

ALTER DISKGROUP DATA SET ATTRIBUTE 'failgroup_repair_time' = '72.5h';



No comments:

Post a Comment

100 Oracle DBA Interview Questions and Answers

  Here are 100 tricky interview questions tailored for a Senior Oracle DBA role. These questions span a wide range of topics, including perf...