Amit's Oracle DBA Blog: RAC Interview Questions

Oracle DBA Interview Questions and Answers

For 11gR2-

> What methods are available to keep the time synchronized on all nodes in the cluster?

Either the Network Time Protocol(NTP) can be configured or in 11gr2, Cluster Time Synchronization Service (CTSS) can be used.

1)How you stop Your RAC Database configue with ASM?
Ans: This is a bit tricky.But a simple logic is Database depends on storage and storage depends on node.Hence We can tell like this,Important is to remember that node application and cluster is started and stopped only with ‘root’
user or the user having sudo privilege to do this in /etc/sudoers file.One more point to remember always asm instance
should be started first.

There are actually 2 commands to do it:

1) crsctl stop crs: This command will stop all the node applications(i.e gsd, VIP, ons, listener) and crs daemons(crsd,evmd,ocsd)
However /etc/init.d/init.crs {start|stop|enable|disable} can be used to start,stop,enable and disable these crs daemons
or services.

Login as root or a user with sudo permission to run the crsctl command:
#cd ORA_CRS_HOME/bin
#crsctl start crs
=> Run as root or you should have SUDO permissing in /etc/sudoers file.
#ssh testnode_1n1 /home/app/crs/bin/crsctl check crs
(or)
#ssh testnode_1n1 /home/app/crs/bin/crsctl stop crs

Note:This command will prevent CRS from starting on a reboot.There is no return output from the command.

[root@TEST_NODE1 oracle]# crsctl disable crs

For checking this use ‘crsstat’ command:

[root@TEST_NODE1 oracle]# crsstat

HA Resource Target State
———– —— —–
ora.TEST_NODE1.ASM1.asm OFFLINE OFFLINE
ora.TEST_NODE1.LISTENER_TEST_NODE1.lsnr OFFLINE OFFLINE
ora.TEST_NODE1.gsd OFFLINE OFFLINE
ora.TEST_NODE1.ons OFFLINE OFFLINE
ora.TEST_NODE1.vip OFFLINE OFFLINE
ora.orcl.RAC.cs OFFLINE OFFLINE
ora.orcl.RAC.orcl1.srv OFFLINE OFFLINE
ora.orcl.RAC.orcl2.srv OFFLINE OFFLINE
ora.orcl.db OFFLINE OFFLINE
ora.orcl.orcl1.inst OFFLINE OFFLINE
ora.orcl.orcl2.inst OFFLINE OFFLINE
ora.TEST_NODE2.ASM2.asm OFFLINE OFFLINE
ora.TEST_NODE2.LISTENER_TEST_NODE2.lsnr OFFLINE OFFLINE
ora.TEST_NODE2.gsd OFFLINE OFFLINE
ora.TEST_NODE2.ons OFFLINE OFFLINE
ora.TEST_NODE2.vip OFFLINE OFFLINE
Let’s not forget to enable CRS on reboot:

[root@TEST_NODE1 oracle]# crsctl enable crs

2)srvctl stop:
————–

To stop a RAC Database configure system with ASM step by step:
————————————————————————–

srvctl stop service -d test =>To stop services like TAF(Transparent Application failover
srvctl stop database -d test =>To stop Database which may include ‘n’ number of instances
srvctl stop asm -n node1-pub1 =>To stop asm instance on node1
srvctl stop asm -n node2-pub2 =>To stop asm instance on node2
srvctl stop nodeapps -n node1-pub1,node2-pub2 =>To stop all node apps(listeners,GSD,ONS,VIP)

(or)

To start a RAC Database configure with ASM:
—————————————————————————

We have to first start the nodeapps,then the ASM instances, followed by the database instances, and lastly the services for TAF and load balancing.

srvctl start nodeapps -n node1-pub
srvctl start nodeapps -n node2-pub
srvctl start asm -n node1-pub
srvctl start asm -n node2-pub
srvctl start database -d test

Note 1: When we use crsctl to stop clusterware,It will stop nodeapps,asm instances and Database instance.

Note 2:Starting and Stopping of Cluster in Oracle 11g is changed because of 1)Separate home for clusterware
i.e Grid home.
2) OCR is in diskgroup inside ASM, then no way to stop ASM using srvctl, you have to shutdown the cluster to stop ASM.
We can start or start cluster or other services as follows:

Starting / Stopping the Cluster in Oracle 11g rel2:
——————————————————————–

Stopping Cluster:
———————–
We can Use the “crsctl stop cluster” command on node1 to stop the Oracle Clusterware stack:
root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop cluster

Starting Cluster:
———————————
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster

We can start/stop all the clusterware nodes by using below command:
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster all
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop cluster all

2)How you check the health of Your RAC Database?
Ans: ‘crsctl’ command from root or oracle user can be used to check the clusterware health But for starting or stopping we have to use root user or any privilege user.

[oracle@TEST_NODE1 ~]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

4)How you check the services in RAC Node?
Ans: We can check the service or start the services with ‘srvctl’ command.load balanced/TAF service named RAC online.

[oracle@TEST_NODE1 ~]$ srvctl start service -d orcl -s RAC
[oracle@TEST_NODE1 ~]$ crsstat
HA Resource Target State
———– —— —–
ora.TEST_NODE1.ASM1.asm ONLINE ONLINE on TEST_NODE1
ora.TEST_NODE1.LISTENER_TEST_NODE1.lsnr ONLINE ONLINE on TEST_NODE1
ora.TEST_NODE1.gsd ONLINE ONLINE on TEST_NODE1
ora.TEST_NODE1.ons ONLINE ONLINE on TEST_NODE1
ora.TEST_NODE1.vip ONLINE ONLINE on TEST_NODE1
ora.orcl.RAC.cs ONLINE ONLINE on TEST_NODE2
ora.orcl.RAC.orcl1.srv ONLINE ONLINE on TEST_NODE1
ora.orcl.RAC.orcl2.srv ONLINE ONLINE on TEST_NODE2
ora.orcl.db ONLINE ONLINE on TEST_NODE2
ora.orcl.orcl1.inst ONLINE ONLINE on TEST_NODE1
ora.orcl.orcl2.inst ONLINE ONLINE on TEST_NODE2
ora.TEST_NODE2.ASM2.asm ONLINE ONLINE on TEST_NODE2
ora.TEST_NODE2.LISTENER_TEST_NODE2.lsnr ONLINE ONLINE on TEST_NODE2
ora.TEST_NODE2.gsd ONLINE ONLINE on TEST_NODE2
ora.TEST_NODE2.ons ONLINE ONLINE on TEST_NODE2
ora.TEST_NODE2.vip ONLINE ONLINE on TEST_NODE2
[oracle@TEST_NODE1 ~]$

3)If there is some issue with virtual IP how will you troubleshoot it?How will you change virtual ip?
Ans: To change the VIP (virtual IP) on a RAC node, use the command

[oracle@testnode oracle]$ srvctl modify nodeapps -A new_address

4)What kind of backup stratergy you follow for your Databases?
Ans:We follow different backup strategy for our Databases depends on type of Database.We use different kind of Backup stratergy for Production,Test,Performance,Demo,Development Databases.But the main
aim is to recover the Database with minimal or no Data loss:

1)Production Databases:
—————————–
Backup stratergy for Production Database is as follows:
RMAN BACKUP:
———————
incremental level 0 =>Weekly Basis at 6am — Full backup of Database with archive logs and copy of Current control file
incremental level 1 =>Mon,Tues,Thurs,Friday at 6am –Changes from recent back to a particular day
cummulative backup =>Wed,Saturday at 6am — changes from the lowest level i.e mon-wed and Thur-Sat.

While deciding the backup stratergy for our production system of 300GB we had in our mind the following points:
1)Backup should be schedule at less peak hours.
2)With no loss we should recover Database in case of any Disaster.

expdp Backup:

—————-
Export Datapump backup on daily basis at 9pm.
We should have one datapump backup which should be most recent to recover the lost of Table or any Data.Below points are same:
1)Backup should be schedule at less peak hours.
2)With no loss we should recover Database in case of any Disaster.

2)Test Databases:

———————
Usually Test Database is almost same as production in terms of Data.However whenever we want to test some
patch or any script before applying to production we can apply in test and than apply in production.I usually
prefer to have same backup stratergy as production for Test Databases.

3)Development Database:
—————————
In a development Database.We can go for below backup statergy,However if you have space and enough infrastructure you can repeat the same backup stratergy as above.

Expdp full backup:
In a Development environment,We should have full Database logical
Backup up to date and should be schedule on daily basis,so that whenever there
Is some table drop or table backup is requested by developer you can restore
That table from your Logical backup.

COLD RMAN BACKUP:
——————-
We can schedule cold rman backup on every sundays at 9am(any time which you feel is convinient without affecting much to developers and end users).Below is the link for implementing RMAN cold Backup:

6)How you will backup your RAC Database?
Ans: Backup strategy of RAC Database:
An RAC Database consists of
1)OCR
2)Voting disk &
3)Database files,controlfiles,redolog files & Archive log files

1)OCR BACKUP:
————-

Oracle performs physical backup of OCR devices every 4 hours under the default backup direcory $ORA_CRS_HOME/cdata/
and then it rolls that forward to Daily, weekly and monthly backup. You can get the backup information by executing below command.
Login with help of ‘root’ user and fire the below command.

ocrconfig -showbackup

[root@testnode -pub ~]# ocrconfig -showbackup

testnode-pub 2011/05/03 17:50:47 /u01/app/crs/cdata/test-crs/backup00.ocr

testnode-pub 2011/05/03 13:46:45 /u01/app/crs/cdata/test-crs/backup01.ocr

Manually backing up the OCR we can do with the help of same command ‘ocrconfig

ocrconfig -manualbackup –>Physical Backup of OCR

The above command backs up OCR under the default Backup directory.We can export the contents of the OCR using below command (Logical backup).

ocrconfig -export /tmp/ocr_exp.dat -s online <
2)Voting Disk Backup:
—————————-
Backup the existing votedisks as below as oracle:

dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0

[root@node1-pub ~]# su – oracle
[oracle@node1-pub ~]$ dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
41024+0 records in
41024+0 records out
[oracle@node1-pub ~]$

Note: If we want to add a voting disk we can do with the help of ‘crsctl’ as given below:

[root@node1-pub ~]# su – oracle
[oracle@node1-pub ~]$ touch /u02/ocfs2/vote/VDFile_3
[oracle@node1-pub ~]$ crsctl add css votedisk /u02/ocfs2/vote/VDFile_3
Now formatting voting disk: /u02/ocfs2/vote/VDFile_3.
Successful addition of voting disk /u02/ocfs2/vote/VDFile_3.

Note:Please ask the interview about which version ,this is because,In Oracle 11g rel2,The way we take backup of Voting Disk and OCR is totally different,Because now we can use ASM for storing OCR and Voting Disk.Another point to keep in mind is Voting File is no longer supported to take a manual backup of it with ‘dd’ command
Instead,the Voting File gets backed up automatically into the OCR. As a New Feature,we can now do a manual backup of the OCR any time you like, without having to wait until that is done automatically – which is also still done.

Now the manual backup:
[root@node1-pub ~]# /u01/app/11.2.0/grid/bin/ocrconfig -manualbackup

We can take backup on a separate shared location also:
[root@node1-pub ~]#/u01/app/11.2.0/grid/bin/ocrconfig -backuploc /u02/ocr_backup

To see the above backup we can use the same command ‘ocrconfig’:
[root@node1-pub ~]# /u01/app/11.2.0/grid/bin/ocrconfig -showbackup

Note:Oracle’s solution to this problem is to combines the clusterware and ASM into a single Grid Infrastructure home and comes up a procedure with a complex start up sequence which mixes the different components of clusterware and ASM instance in order

3)Datafiles Backup:
——————–
For Backing up your Database files,Redolog files,control files,Archive log files and other Database files,Please use ‘rman’ as it has many advantages over other
user managed backup.RMAN gives you the flexibility of backing up Database in archive log mode(hot backup)
and unarchive log mode(Cold backup).

7)How to recover your RAC Database?
Ans: If you have valid backup of OCR,Voting Disk and Database files than RAC Database recovery can be done without much lost or Damages.Below are some steps to do these:
1)OCR can be restore and recover using ‘ocrconfig’ command if you have a valid export backup of ocr.
2)Voting disk can be restore and recovered using ‘dd’ command if you have got the backup which you took from ‘dd’ command.

Detail process we can get from below link:
http://www.idevelopment.info/data/Oracle/DBA_tips/Oracle10gRAC/CLUSTER_65.shtml#Recover the Voting Disk

3)Restore and Recover the Database
First, take the database out of cluster mode,This is very important step, via:

# sqlplus “/ as sysdba”

SQL*Plus: Release 10.2.0.3.0 – Production on Sat May 24 17:02:17 2008

Connected to an idle instance.

SQL> startup nomount;
ORACLE instance started.

Total System Global Area 268435456 bytes
Fixed Size 1261300 bytes
Variable Size 251658508 bytes
Database Buffers 8388608 bytes
Redo Buffers 7127040 bytes

SQL> alter system set cluster_database=false scope=spfile sid=’*’;

System altered.

SQL> shutdown immediate;
ORA-01507: database not mounted
ORACLE instance shut down.

Then restore the database via RMAN:

# rman target=/

Recovery Manager: Release 10.2.0.3.0 – Production on Sat May 24 17:04:10 2008

connected to target database (not started)

RMAN> startup mount;

Oracle instance started
database mounted

Total System Global Area 268435456 bytes

Fixed Size 1261300 bytes
Variable Size 251658508 bytes
Database Buffers 8388608 bytes
Redo Buffers 7127040 bytes

RMAN> restore database;

Starting restore at 24-MAY-08
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=154 devtype=DISK

channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to /u02/oracle/ora1/TESTDB/systTESTDB01.dbf
restoring datafile 00002 to /u02/oracle/ora1/TESTDB/undotbs01.dbf
restoring datafile 00003 to /u02/oracle/ora1/TESTDB/sysaux01.dbf
restoring datafile 00004 to /u02/oracle/ora1/TESTDB/users01.dbf
restoring datafile 00005 to /u02/oracle/ora1/TESTDB/example01.dbf
restoring datafile 00006 to /u02/oracle/ora1/TESTDB/undotbs02.dbf
restoring datafile 00007 to /u02/oracle/ora2/TESTDB/mgmt.dbf
restoring datafile 00008 to /u02/oracle/ora1/TESTDB/mgmt_ecm_depot1.dbf
channel ORA_DISK_1: reading from backup piece /u02/oracle/ora3/TESTDB/backups/0ijh6j4t_1_1
channel ORA_DISK_1: restored backup piece 1
piece handle=/u02/oracle/ora3/TESTDB/backups/0ijh6j4t_1_1 tag=FHB_EM1200805241630
channel ORA_DISK_1: restore complete, elapsed time: 00:06:36
Finished restore at 24-MAY-08

Since my database is so small, I will simply restore the entire thing, however, since you know which datafile is missing, you could simply restore it and then recover the database as necessary.

Recover the Database

RMAN> recover database;

Starting recover at 24-MAY-08
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 24-MAY-08

RMAN> alter database open;

database opened
RMAN-06900: WARNING: unable to generate V$RMAN_STATUS or V$RMAN_OUTPUT row
RMAN-06901: WARNING: disabling update of the V$RMAN_STATUS and V$RMAN_OUTPUT rows
ORACLE error from target database:
ORA-06502: PL/SQL: numeric or value error: character to number conversion error

Afterwards, place the database back into cluster mode and startup both instances:

# sqlplus “/ as sysdba”

SQL*Plus: Release 10.2.0.3.0 – Production on Sat May 24 17:16:36 2008

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> alter system set cluster_database=true scope=spfile sid=’*’;

System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

# srvctl start database -d TESTDB
[oracle@rac1 bdump]$ srvctl status database -d TESTDB
Instance TESTDB1 is running on node rac1
Instance TESTDB2 is running on node rac2

Below are for pre 11gR2 versions

1. Where are the Clusterware files stored on a RAC environment?
The Clusterware is installed on each node (on an Oracle Home) and on the shared disks (the voting disks and the CSR file) RAC Interview Question

2. Where are the database software files stored on a RAC environment?
The base software is installed on each node of the cluster and the
database storage on the shared disks.

3. What kind of storage we can use for the shared Clusterware files?
– OCFS (Release 1 or 2)
– raw devices
– third party cluster file system such as GPFS or Veritas

4. What kind of storage we can use for the RAC database storage?
– OCFS (Release 1 or 2)
– ASM
– raw devices
– third party cluster file system such as GPFS or Veritas

5. What is a CFS?
A cluster File System (CFS) is a file system that may be accessed (read and write) by all members in a cluster at the same time. This implies that all members of a cluster have the same view.

6. What is an OCFS2?
The OCFS2 is the Oracle (version 2) Cluster File System which can be used for the Oracle Real Application Cluster.

7. Which files can be placed on an Oracle Cluster File System?
– Oracle Software installation (Windows only)
– Oracle files (controlfiles, datafiles, redologs, files described by the bfile datatype)
– Shared configuration files (spfile)
– OCR and voting disk
– Files created by Oracle during runtime
Note: There are some platform specific limitations.

8. Do you know another Cluster Vendor?
HP Tru64 Unix, Veritas, Microsoft

9. How is possible to install a RAC if we don’t have a CFS?
This is possible by using a raw device.

10. What is a raw device?
A raw device is a disk drive that does not yet have a file system set up. Raw devices are used for Real Application Clusters since they enable the sharing of disks.

11. What is a raw partition?
A raw partition is a portion of a physical disk that is accessed at the lowest possible level. A raw partition is created when an extended partition is created and logical partitions are assigned to it without any formatting. Once formatting is complete, it is called cooked partition.

12. When to use CFS over raw?
A CFS offers:
– Simpler management
– Use of Oracle Managed Files with RAC
– Single Oracle Software installation
– Autoextend enabled on Oracle datafiles
– Uniform accessibility to archive logs in case of physical node failure
– With Oracle_Home on CFS, when you apply Oracle patches CFS guarantees that the updated Oracle_Home is visible to all nodes in the cluster.
Note: This option is very dependent on the availability of a CFS on your platform.

13. When to use raw over CFS?
– Always when CFS is not available or not supported by Oracle.
– The performance is very, very important: Raw devices offer best performance without any intermediate layer between Oracle and the disk.
Note: Autoextend fails on raw devices if the space is exhausted. However the space could be added online if needed.

14. What CRS is?
Oracle RAC 10g Release 1 introduced Oracle Cluster Ready Services (CRS), a platform-independent set of system services for cluster environments. In Release 2, Oracle has renamed this product to Oracle Clusterware.

15. What is VIP IP used for?
It returns a dead connection IMMIDIATELY, when its primary node fails. Without using VIP IP, the clients have to wait around 10 minutes to receive ORA-3113: “end of file on communications channel”. However, using Transparent Application Failover (TAF) could avoid ORA-3113.

16. Why we need to have configured SSH or RSH on the RAC nodes?
SSH (Secure Shell,10g+) or RSH (Remote Shell, 9i+) allows “oracle” UNIX account connecting to another RAC node and copy/ run commands as the local “oracle” UNIX account.

17. Is the SSH, RSH needed for normal RAC operations?
No. SSH or RSH are needed only for RAC, patch set installation and clustered database creation.

18. Do we have to have Oracle RDBMS on all nodes?
Each node of a cluster that is being used for a clustered database will typically have the RDBMS and RAC software loaded on it, but not actual data files (these need to be available via shared disk).

19. What are the restrictions on the SID with a RAC database? Is it limited to 5 characters?
The SID prefix in 10g Release 1 and prior versions was restricted to five characters by install/ config tools so that an ORACLE_SID of up to max of 5+3=8 characters can be supported in a RAC environment. The SID prefix is relaxed up to 8 characters in 10g Release 2, see bug 4024251 for more information.

20. Does Real Application Clusters support heterogeneous platforms?
The Real Application Clusters do not support heterogeneous platforms in the same cluster.

21. Are there any issues for the interconnect when sharing the same switch as the public network by using VLAN to separate the network?
RAC and Clusterware deployment best practices suggests that the interconnect (private connection) be deployed on a stand-alone, physically separate, dedicated switch. On big network the connections could be instable.

22. What is the Load Balancing Advisory?
To assist in the balancing of application workload across designated resources, Oracle Database 10g Release 2 provides the Load Balancing Advisory. This Advisory monitors the current workload activity across the cluster and for each instance where a service is active; it provides a percentage value of how much of the total workload should be sent to this instance as well as service quality flag.

23. How many nodes are supported in a RAC Database?
With 10g Release 2, we support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database. Currently DBCA has a bug where it will not go beyond 63 instances. There is also a documentation bug for the max-instances parameter. With 10g Release 1 the Maximum is 63.

24. What is the Cluster Verification Utiltiy (cluvfy)?
The Cluster Verification Utility (CVU) is a validation tool that you can use to check all the important components that need to be verified at different stages of deployment in a RAC environment. RAC Interview Question

25. What versions of the database can I use the cluster verification utility (cluvfy) with?
The cluster verification utility is release with Oracle Database 10g Release 2 but can also be used with Oracle Database 10g Release 1. RAC Interview Question

26. If I am using Vendor Clusterware such as Veritas, IBM, Sun or HP, do I still need Oracle Clusterware to run Oracle RAC 10g?
Yes. When certified, you can use Vendor Clusterware however you must still install and use Oracle Clusterware for RAC. Best Practice is to leave Oracle Clusterware to manage RAC. For details see Metalink Note 332257.1 and for Veritas SFRAC see 397460.1.

27. Is RAC on VMW are supported?
Yes.

28. What is hangcheck timer used for ?
The hangcheck timer checks regularly the health of the system. If the system hangs or stop the node will be restarted automatically.

There are 2 key parameters for this module: RAC Interview Question

-> hangcheck-tick: this parameter defines the period of time between checks of system health. The default value is 60 seconds; Oracle recommends setting it to 30seconds.

-> hangcheck-margin: this defines the maximum hang delay that should be tolerated before hangcheck-timer resets the RAC node.

29. Is the hangcheck timer still needed with Oracle RAC 10g?
Yes.

30. What files can I put on Linux OCFS2?
For optimal performance, you should only put the following files on Linux OCFS2:
– Datafiles
– Control Files
– Redo Logs
– Archive Logs
– Shared Configuration File (OCR)
– Voting File
– SPFILE

31. Is it possible to use ASM for the OCR and voting disk?
No, the OCR and voting disk must be on raw or CFS (cluster file system).

32. Can I change the name of my cluster after I have created it when I am using Oracle Clusterware?
No, you must properly uninstall Oracle Clusterware and then re-install.

33. What the O2CB is?
The O2CB is the OCFS2 cluster stack. OCFS2 includes some services. These services must be started before using OCFS2 (mount/ format the file systems).

34. What the OCR file is used for?
OCR is a file that manages the cluster and RAC configuration.

35. What the Voting Disk file is used for?
The voting disk is nothing but a file that contains and manages information of all the node memberships.

36. What is the recommended method to make backups of a RAC environment?
RMAN to make backups of the database, dd to backup your voting disk and hard copies of the OCR file.

37. What command would you use to check the availability of the RAC system?
crs_stat -t -v (-t -v are optional) RAC Interview Question

38. What is the minimum number of instances you need to have in order to create a RAC?
You can create a RAC with just one server.

39. Name two specific RAC background processes
RAC processes are: LMON, LMDx, LMSn, LKCx and DIAG.

40.Can you have many database versions in the same RAC?
Yes, but Clusterware version must be greater than the greater database version.

44. Which command will we use to manage (stop, start) RAC services in command-line mode?
srvctl

45. How many alert logs exist in a RAC environment?
One for each instance.

46. What are Oracle Clusterware Components
Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.

Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster

47. How do you backup voting disk
#dd if=voting_disk_name of=backup_file_name

48. How do I identify the voting disk location
#crsctl query css votedisk

49. How do I identify the OCR file location
check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck

50. How do you backup the OCR
There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOMEcdata”clustername”
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig -manualbackup RAC Interview Question

51. What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.

52. What are Oracle Clusterware processes for 10g on Unix and Linux
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource’s configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd) —A background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends clusterware to support Oracle- specific requirements and complex resources. Runs server callout scripts when FAN events occur.

53. What are Oracle database background processes specific to RAC
•LMS—Global Cache Service Process
•LMD—Global Enqueue Service Daemon
•LMON—Global Enqueue Service Monitor
•LCK0—Instance Enqueue Process
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

54. How do you troubleshoot node reboot
Please check metalink …
Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.

55. Is ssh required for normal Oracle RAC operation ?
“ssh” are not required for normal Oracle RAC operation. However “ssh” should be enabled for Oracle RAC and patchset installation.

56. What is the purpose of Private Interconnect ?
Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.

57. Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don’t really have a good HA solution without using VIPs.

When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

58. What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?
This is most likely due to a fault in interconnect network.
Check netstat -s
if you see “fragments dropped” or “packet reassemblies failed” , Work with your system administrator find the fault with network.

59. How many nodes are supported in a RAC Database?
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.

60. Srvctl cannot start instance, I get the following error PRKP- 1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.

61. What is the purpose of the ONS daemon?
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.

This in order to facilitate:

a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.

b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.

2) What are Oracle CLusterware and Daemon Processes and what they do?
Ans: ocssd, crsd, evmd, oprocd, racgmain, racgimon

3) What are the special background processes for RAC (or) what is difference in stand-alone database & RAC database background processes?
Ans: DIAG, LCKn, LMD, LMSn, LMON

4) What are structural changes in 11g R2 RAC?
Ans: Grid & ASM are on one home, Voting disk & ocrfile can be on the ASM, SCAN, By using srvctl, we can mange diskgroups, home, ons, eons, filesystem, srvpool, server, scan, scan_listener, gns, vip, oc4j,GSD

5) What is Cache Fusion?
Ans: Transferring of data between RAC instances by using private network. Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster. When a block of data is read from datafile by an instance within the cluster and another instance is in need of the same block, it is easy to get the block image from the instance which has the block in its SGA rather than reading from the disk.

6) What is the purpose of Private Interconnect?
Ans: Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the clustered nodes. This communication is based on the TCP protocol. RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP).

7) What are the Clusterware Components?
Ans: Voting Disk – Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.

Oracle Cluster Registry (OCR) – Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster.

The daemon OCSSd manages the configuration info in OCR and maintains the changes to cluster in the registry.
Virtual IP (VIP) – When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients.
This results in the clients getting errors immediately.
crsd – Cluster Resource Services Daemon
cssd – Cluster Synchronization Services Daemon
evmd – Event Manager Daemon
oprocd / hangcheck_timer – Node hang detector

8) What is OCR file?

Ans: RAC configuration information repository that manages information about the cluster node list and instance-to-node mapping information. The OCR also manages information about Oracle Clusterware resource profiles for customized applications. Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster.The daemon OCSSd manages the configuration info in OCR and maintains the changes to cluster in the registry.

9) What is Voting disk and how many files should be there?
Ans: Voting Disk File is a file on the shared cluster system or a shared raw device file. Oracle Clusterware uses the voting disk to determine which instances are members of a cluster. Voting disk is akin to the quorum disk, which helps to avoid the split-brain syndrome. Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.

10) How to take backup of OCR file?
Ans: #ocrconfig -manualbackup,

#ocrconfig -export file_name.dmp,

#ocrdump -backupfile my_file,

$cp -p -R /u01/app/crs/cdata /u02/crs_backup/ocrbackup/RAC1

11) How to recover OCR file?
Ans:
#ocrconfig -restore backup_file.ocr
#ocrconfig -import file_name.dmp

12) what is Local OCR?
Ans:
/etc/oracle/local.ocr
/var/opt/oracle/local.ocr

13) How to check backup of OCR files?
Ans:
#ocrconfig –showbackup

14) How to take backup of voting file?
Ans: dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
crsctl backup css votedisk — from 11g R2

15) How do I identify the voting disk location? Or what is the location of Voting disk?
Ans: # crsctl query css votedisk
16) What is the location of OCR file?
check /var/opt/oracle/ocr.loc or /etc/ocr.loc
Ans: # ocrcheck

17) If voting disk or OCR file got corrupte and don’t have backups so how to get them?
Ans: We have to install Clusterware.

18) who will manage OCR files?
Ans: cssd will manage OCR.

19) who will take backup of OCR files?
Ans: crsd will take backup.

20) What is split brain syndrome?
Ans: Will arise when two or more instances attempt to control a cluster database. In a two-node environment, one instance attempts to manage updates simultaneously while the other instance attempts to manage updates.

21) what are various IPs used in RAC? And How may IPs we need in RAC?
Ans: Public IP, Private IP, Virtual IP, SCAN IP

22) whaat is the use of Virtual IP?
Ans: When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients.This results in the clients getting errors immediately.

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don’t really have a good HA solution without using VIPs.

23) what is the use of SCAN IP (scan Name) and will it provide load balancing?
Ans:Single Client Access Name (SCAN) is a new Oracle Real Application Clusters (RAC) 11g Release 2,
feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.

24) How many Scan Listeners will be running?
Ans: Three SCAN listeners only.

25) What is FAN?
Ans: Applications can use Fast Application Notification (FAN) to enable rapid failure detection, balancing of connection pools after failures, and re-balancing of connection pools when failed components are repaired.
The FAN process uses system events that Oracle publishes when cluster servers become unreachable or if network interfaces fail.

26) What is FCF?
Ans: Fast Connection Failover provides high availability to FAN integrated clients, such as clients that use JDBC, OCI, or ODP.NET. If you configure the client to use fast connection failover, then the client automatically subscribes to FAN events and can react to database UP and DOWN events. In response, Oracle gives the client a connection to an active instance that provides the requested database service.

27) What is TAF and TAF policies?
Ans: Transparent Application Failover (TAF) – A runtime failover for high availability environments,
such as Real Application Clusters and Oracle Real Application Clusters Guard, TAF refers to the failover and re-establishment of application-to-service connections. It enables client applications to automatically reconnect to the database if the connection fails, and optionally resume a SELECT statement that was in progress. This reconnect happens automatically from within the Oracle Call Interface (OCI) library.

28) what are nodeapps?
Ans: VIP, listener, ONS, GSD

29) How to do load balancing in RAC?

Client Side Connect-Time Load Balance:

The client load balancing feature enables clients to randomize connection requests among the listeners.
This is done by client Tnsnames Parameter: LOAD_BALANCE.
The (load_balance=yes) instructs SQLNet to progress through the list of listener addresses in the address_list section of the net service name in a random sequence. When set to OFF, instructs SQLNet to try the addresses sequentially until one succeeds.

Client Side Connect-Time failover

This is done by client Tnsnames Parameter: FAILOVER
The (failover=on) enables clients to connect to another listener if the initial connection to the first listener fails. Without connect-time failover, Oracle Net attempts a connection with only one listener.

Server Side Listener Connection Load Balancing.

With server-side load balancing, the listener directs a connection request to the best instance currently providing the service.
Init parameter remote_listener should be set. When set, each instance registers with the TNS listeners running on all nodes within the cluster.

There are two types of server-side load balancing:

Load Based — Server side load balancing redirects connections by default depending on node load. This id default.
Session Based — Session based load balancing takes into account the number of sessions connected to each node and then distributes the connections to balance the number of sessions across the different nodes.

Transparent Application Failover(TAF):
Transparent Application Failover (TAF) is a feature of the Oracle Call Interface (OCI) driver at client side. It enables the application to automatically reconnect to a database, if the database instance to which the connection is made fails. In this case, the active transactions roll back.
Tnsnames Parameter: FAILOVER_MODE

e.g (failover_mode=(type=select)(method=basic))
Failover Mode Type can be Either SESSION or SELECT.

Session failover will have just the session to failed over to the next available node. With SELECT, the select query will be resumed.
TAF can be configured with just server side service settings by using dbms_service package.

Fast Connection Failover (FCF):

Fast Connection Failover is a feature of Oracle clients that have integrated with FAN HA Events.
Oracle JDBC Implicit Connection Cache, Oracle Call Interface (OCI), and Oracle Data Provider for .Net (ODP.Net) include fast connection failover.

With fast connection failover, when a down event is received, cached connections affected by the down event are immediately marked invalid and cleaned up.

30) What are the uses of services? How to find out the services in cluster?
Ans: Applications should use the services to connect to the Oracle database.Services define rules and characteristics (unique name, workload balancing, failover options, and high availability) to control how users and applications connect to database instances.

31) how to file out the nodes in cluster (or) how to find out the master node?

Ans: # olsnodes — Which ever displayed first, is the master node of the cluster.
select MASTER_NODE from v$ges_resource;

To find out which is the master node, you can see ocssd.log file and search for “master node number”.

32) How to know the public IPs, Private IPs, VIPs in RAC?
Ans:
# olsnodes -n -p -i
node1-pub 1 node1-prv node1-vip
node2-pub 2 node2-prv node2-vip

33) What utility is used to start DB or Instance?
Ans:
srvctl start database –d database_name
srvctl start instance –d database_name –i instance_name

34) How can you shutdown single instance?
Ans:
Change cluster_database=false
srvctl stop instance –d database_name –i instance_name

35) What is HAS (High Availability Service) and the Comnmands?
Ans: HAS includes ASM and database instance and listeners.

crsctl check has
crsctl config has
crsctl disable has
crsctl enable has
crsctl query has releaseversion
crsctl query has softwareversion
crsctl start has
crsctl stop has [-f]

36) How many nodes are supported in a RAC Database?
Ans: 10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.

37) What is fencing?

Ans: I/O fencing prevents updates by failed instances, and detecting failure and preventing split brain in cluster.When a cluster node fails, the failed node needs to be fenced off from all the shared disk devices or diskgroups. This methodology is called I/O Fencing, sometimes called Disk Fencing or failure fencing.

38) Why Clusterware installed in root?
Oracle Clusterware works closely with the operating system, system administrator access is required for some of the installation tasks. In addition, some of the Oracle Clusterware processes must run as the special operating system user, root.

39) What are the initialization parameters that must have same value for every instance in an Oracle RAC database?
Ans:ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_PASSWORD_FILE
UNDO_MANAGEMENT

Amit's Oracle DBA Blog

Disclaimer

Wednesday, 21 July 2021

RAC Interview Questions

> What methods are available to keep the time synchronized on all nodes in the cluster?

No comments:

Post a Comment

Oracle Exadata

Labels