Amit's Oracle DBA Blog: REAL APPLICATION CLUSTER

REAL APPLICATION CLUSTER:-

BG Process

Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion. Oracle RAC instances are composed of following background processes:

GTX0-j: The process provides transparent support for XA global transactions in a RAC environment. The database auto tunes the number of these processes based on the workload of XA global transactions.

LMON: Global Enqueue Service Monitor This process monitors global enqueues and resources across the cluster and performs global enqueue recovery operations.

LMD:Global enqueue service daemon This process manages incoming remote resource requests within each instance.

LMS: Global Cache service process This process maintains statuses of datafiles and each cached block by recording information in a Global Resource Directory (GRD).This process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances. This processing is a part of cache fusion feature.

LCK0:Instance enqueue process This process manages non-cache fusion resource requests such as library and row cache requests.

RMSn: Oracle RAC management process These processes perform manageability tasks for Oracle RAC.Tasks include creation of resources related Oracle RAC when new instances are added to the cluster.

RSMN: Remote Slave Monitor This process manages background slave process creation and communication on remote instances. This is a background slave process. This process performs tasks on behalf of a co-coordinating process running in another instance.

Major RAC wait events

In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs. The most common wait events related to this are gc cr request and gc buffer busy

GC CR request: the time it takes to retrieve the data from the remote cache

Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)

GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block.

What are the background process that exists in 11gr2 and functionality?

CRSD The CRS daemon (crsd) manages cluster resources based on configuration information that is stored in Oracle Cluster Registry (OCR) for each resource. This includes start, stop, monitor, and failover operations. The crsd process generates events when the status of a resource changes.

CSSD Cluster Synchronization Service (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party clusterware, then CSS processes interfaces with your clusterware to manage node membership information. CSS has three separate processes: the CSS daemon (OCSSD), the CSS Agent (cssdagent), and the CSS Monitor (cssdmonitor). The cssdagent process monitors the cluster and provides input/output fencing. This service formerly was provided by Oracle Process Monitor daemon (oprocd. A cssdagent failure results in Oracle Clusterware restarting the node.

Diskmon Disk Monitor daemon (diskmon): Monitors and performs input/output fencing for Oracle Exadata Storage Server. As Exadata storage can be added to any Oracle RAC node at any point in time, the diskmon daemon is always started when OCSSD is started.

Evmd Event Manager (EVM): Is a background process that publishes Oracle Clusterware events

mdnsd Multicast domain name service (mDNS): Allows DNS requests. The mDNS process is a background process on Linux and UNIX, and a service on Windows.

Gnsd Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS and external DNS servers. The GNS process performs name resolution within the cluster.

Ons Oracle Notification Service (ONS): Is a publish-and-subscribe service for communicating Fast Application Notification (FAN) events

Oraagent: Extends clusterware to support Oracle-specific requirements and complex resources. It runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1).

Oracle root agent (orarootagent): Is a specialized oraagent process that helps CRSD manage resources owned by root, such as the network, and the Grid virtual IP address

oclskd Cluster kill daemon (oclskd): Handles instance/node evictions requests that have been escalated to CSS

Grid IPC daemon (gipcd): Is a helper daemon for the communications infrastructure

CTSSD Cluster time synchronisation daemonto manage the time syncrhonization between nodes, rather depending on NTP.

VIP: If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections.

VD: Oracle Clusterware uses the VD to determine which instances are members of a cluster. Basically all nodes in the RAC cluster register heart-beat information on the VD. The number decides the number of active nodes in the RAC cluster. These are also used for checking the availability of instances in RAC and remove unavailable nodes out of the cluster. It help in preventing split-brain condition and keep database information intact.

Oracle recommends that you have minimum 2 VDs.

Node should see maximum number of VD to continue to function, so with 2 it can see only 1, its not maximum number but its half value of VD.

The CSS daemon in the clusterware maintains the heart beats to the VD.

Split Brain Syndrome

Split brain syndrome occurs when the instance members in a RAC fail to ping/connect each other via private interconnect but the servers are physically up and running and database instance on each of these servers is also running. These individual nodes are running fine and can conceptually accept the user connections and work independently. So basically due to lack of communication the instance thinks that the other instance that it is not able to connect is down and it needs to do something about the situation. The problem is if we leave these instances running ,the same block might get read, updated in the individual instances and there would be data integrity issue, as the block changed by one instance will not be locked and could be over-written by another instance.

Node Eviction

In RAC if any node become inactive or if other nodes are unable to ping/connect to a node in the RAC, then the node which first detect that one of the node is not accessible ,it will evict that node from the RAC group. When we see that the node is evicted , usually Oracle rac will reboot that node and try to do a cluster reconfiguration to include back the evicted node. You will see the Oracle error ORA-29740 when there is a node eviction in RAC.

Reasons for Node Reboot or Node Eviction

Whenever, Database Administrator face Node Reboot issue, First thing to look at should be /var/log/message and OS Watcher logs of the Database Node which was rebooted.

var/log/messages will give you an actual picture of reboot:- Exact time of restart, status of resource like swap and RAM etc.

1. High Load on Database Server: High load on the system was reason for Node Evictions, One common scenario is due to high load RAM and SWAP space of DB node got exhaust and system stops working and finally reboot.

So, Every time you see a node eviction start investigation with /var/log/messages and Analyze OS Watcher logs.

2. Voting Disk not Reachable: One of the another reason for Node Reboot is clusterware is not able to access a minimum number of the voting files. When the node aborts for this reason, the node alert log will show CRS-1606 error.

There could be two reasons for this issue:

A. Connection to the voting disk is interrupted.

B. if only one voting disk is in use and version is less than 11.2.0.3.4, hitting known bug 13869978.

How to Solve Voting Disk Outage ?

There could be many reasons for voting disk is not reachable, Here are few general approach for DBA to follow.

1. Use command "crsctl query css votedisk" on a node where clusterware is up to get a list of all the voting files.

2. Check that each node can access the devices underlying each voting file.

3. Check for permissions to each voting file/disk have not been changed.

4. Check OS, SAN, and storage logs for any errors from the time of the incident.

5. Apply fix for 13869978 if only one voting disk is in use. This is fixed in 11.2.0.3.4 patch set and above, and 11.2.0.4 and above

3. Missed Network Connection between Nodes: In technical term this is called as Missed Network Heartbeat (NHB). Whenever there is communication gap or no communication between nodes on private network (interconnect) due to network outage or some other reason. A node abort itself to avoid "split brain" situation. The most common (but not exclusive) cause of missed NHB is network problems communicating over the private interconnect.

Suggestion to troubleshoot Missed Network Heartbeat.

1. Check OS statistics from the evicted node from the time of the eviction. DBA can use OS Watcher to look at OS Stats at time of issue, check oswnetstat and oswprvtnet for network related issues.

2. Validate the interconnect network setup with the Help of Network administrator.

3. Check communication over the private network.

4. Check that the OS network settings are correct by running the RACcheck tool.

4. Database Or ASM Instance Hang: Sometimes Database or ASM instance hang can cause Node reboot. In these case Database instance is hang and is terminated afterwards, which cause either reboot cluster or Node eviction. DBA should check alert log of Database and ASM instance for any hang situation which might cause this issue.

11gR2 Changes –> Important, in 11GR2, the fencing (eviction) does not to reboot.

Until Oracle Clusterware 11.2.0.2, fencing (eviction) meant “re-boot”

With Oracle Clusterware 11.2.0.2, re-boots will be seen less, because:

– Re-boots affect applications that might run an a node, but are not protected

– Customer requirement: prevent a reboot; just stop the cluster – implemented…

With Oracle Clusterware 11.2.0.2, re-boots will be seen less: Instead of fast re-booting the node, a graceful shutdown of the cluster stack is attempted

Different platforms maintain logs at different locations, as shown in the following example:

Kernel syslog file /var/log/syslog/*

->OS or vendor clusterware events, including node reboot times

HPUX - /var/adm/syslog/syslog.log

AIX - /bin/errpt –a

Linux - /var/log/messages

Windows - Refer .TXT log files under Application/System log using Windows Event Viewer

Solaris - /var/adm/messages

Trace File 11gR2 Trace Location Description

$GRID_HOME/log/<hostname>/

/ cssd, /crsd, /evmd, /ohasd, /mdnsd,/gpnpd,/diskmon, /ctssd,

CSS logs->Oracle Clusterware cluster synchronization services (CSS)To get more tracing, issue `crsctl set css trace 2` as root

CRS logs->Oracle Clusterware cluster ready services (CRS) management and HA policy tracing; one log per RAC host

EVM logs->Oracle Clusterware event management layer tracing; one log per RAC host

OHAS logs <CRSHome>/log/<hostname>/ohasd

-> one log per RAC host

MDNS logs->Multicast Domain Name Service Daemon; one log per RAC host.

GPNP logs-> one log per RAC host.

DISKMON logs ->Disk Monitor daemon; one log per RAC host.

CTSS logs-> one log per RAC host also generate alert in GRID log file.

SRVM logs <CRSHome>/srvm/log OCR tracing; to get more tracing, edit mesg_logging_level in srvm/admin/ocrlog.ini file; one log per RAC host

CRSD oraagent Logs<CRSHome>/log/<hostname>/agent/crsd/oraagent_<crs admin username>

->Oracle agent (oraagent), which was known as RACG in 11R1.

CRSD orarootagent Logs<CRSHome>/log/<hostname>/agent/crsd/orarootagent_<crs admin username>

->Oracle root agent (orarootagent), which manages crsd resources owned by root, such as network, vip and scan vip

Clusterware Software Stack

Beginning with Oracle 11gR2, Oracle redesigned Oracle Clusterware into two software stacks: the High Availability

Service stack and CRS stack. Each of these stacks consists of several background processes. The processes of these two

stacks facilitate the Clusterware

ASM and Clusterware: Which One is Started First?

If you have used Oracle RAC 10g and 11gR1, you might remember that the Oracle Clusterware stack has to be up

before the ASM instance starts on the node. Because 11gR2, OCR, and VD also can be stored in ASM.

ASM is a part of the CRS of the Clusterware and it is started at Level 3 after the high availability stack is started and before CRSD is started. During the startup of the high availability stack, the Oracle Clusterware gets the clusterware configuration from OLR and the GPnP profile instead of from OCR. Because these two components are stored in the $GRID_HOME in the local disk, the ASM instance and ASM diskgroup are not needed for the startup of the high availability stack. Oracle Clusterware also doesn’t rely on an ASM instance to access the VD. The location of the VD file is in the ASM disk

header. We can see the location information with the following command:

$ kfed read /dev/dm-8 | grep -E 'vfstart|vfend'

ORACLE LOCAL REGISTRY

An additional cluster configuration file has been introduced with Oracle 11.2, the so-called Oracle Local Registry (OLR). Each node has its own copy of the file in the Grid Infrastructure software home. The OLR stores important security contexts used by the Oracle High Availability Service early in the start sequence of clusterware. The information in the OLR and the Grid Plug and Play configuration file are needed to locate the voting disks.The information stored in the OLR is needed by the Oracle High Availability Services daemon (OHASD) to start; this includes data about GPnP wallets, Clusterware configuration, and version information. If they are stored in ASM, the discovery string in the GPnP profile will be used by the cluster synchronization daemon to look them up. Later in the Clusterware boot sequence, the ASM instance will be started by the cssd process to access the OCR files; however, their location is stored in the /etc/ocr.loc file Of course, if the voting files and OCR are on a shared cluster file system, then an ASM instance is not needed and won’t be started unless a different resource depends on ASM.In contrast, the OCR is used extensively by CRSD. The OLR is maintained by the same command-line utilities as the OCR, with the appended -local option.

Clusterware Startup Sequence

Level 0: The OS automatically starts Clusterware through the OS’s init process. The init process spawns only

one init.ohasd, which in turn starts the OHASD process. This is configured in the /etc/inittab file:

Once OHASD is started on Level 0, OHASD is responsible for starting the rest of the Clusterware and the resources

that Clusterware manages directly or indirectly through Levels 1-4.

Level 1: OHASD directly spawns four agent processes:

• cssdmonitor: CSS Monitor

• OHASD orarootagent: High Availability Service stack Oracle root agent

• OHASD oraagent: High Availability Service stack Oracle agent

• cssdagent: CSS Agent

Level 2: On this level, OHASD oraagent spawns five processes:

• mDNSD: mDNS daemon process

• GIPCD: Grid Interprocess Communication

• GPnPD: GPnP profile daemon

• EVMD: Event Monitor daemon

• ASM: Resource for monitoring ASM instances

Then, OHASD oraclerootagent spawns the following processes:

• CRSD: CRS daemon

• CTSSD: CTSS daemon

• Diskmon: Disk Monitor daemon (Exadata Storage Server storage)

• ACFS: (ASM Cluster File System) Drivers

Next, the cssdagent starts the CSSD (CSS daemon) process.

Level 3: The CRSD spawns two CRSD agents: CRSD orarootagent and CRSD oracleagent.

Level 4: On this level, the CRSD orarootagent is responsible for starting the following resources:

• Network resource: for the public network

• SCAN VIPs

• Node VIPs: VIPs for each node

• ACFS Registry

• GNS VIP: VIP for GNS if you use the GNS option

Then, the CRSD orarootagent is responsible for starting the rest of the resources as follows:

• ASM Resource: ASM Instance(s) resource

• Diskgroup: Used for managing/monitoring ASM diskgroups.

• DB Resource: Used for monitoring and managing the DB and instances

• SCAN listener: Listener for SCAN listening on SCAN VIP

• SCAN VIP: Single Client Access Name VIP

• Listener: Node listener listening on the Node VIP

• Services: Database services

• ONS

• eONS: Enhanced ONS

• GSD: For 9i backward compatibility

• GNS (optional): performs name resolution

How connection load balancing works using SCAN?

For clients connecting using Oracle SQL*Net 11g Release 2, three IP addresses will be received by the client by resolving the SCAN name through DNS. The client will then go through the list it receives from the DNS and try connecting through one of the IPs received. If the client receives an error, it will try the other addresses before returning an error to the user or application. This is similar to how client connection failover works in previous releases when an address list is provided in the client connection string. When a SCAN Listener receives a connection request, the SCAN Listener will check for the least loaded instance providing the requested service. It will then re-direct the connection request to the local listener on the node where the least loaded instance is running. Subsequently, the client will be given the address of the local listener. The local listener will finally create the connection to the database instance

Oracle Database 11g R2 RAC - The top 5 changes

With the release of Oracle Database 11g R2 a fundamental change was made that impacts a RAC based system. The cause of these changes is the introduction of the Oracle Grid Infrastructure product, which essentially combines

ASM and Oracle Clusterware into the same oracle home and product. ASM was previously incorporated into the Oracle Database home and binaries, and Oracle Clusterware was a stand-alone product installed with its own home and binaries.

With this new configuration, five main changes have been incorporated into Oracle Database 11g RAC. These changes include installing and working with Grid Infrastructure, which includes setting up and configuring ASM; Single Client Access Names (SCAN), RAC One Node, Automatic Workload Balancing and the ASM Cluster File System (ACFS).

Non-RAC To RAC Conversion

Pre-condition:

->All the server setup should be ready.

->Shared disk should be available.

->CRS software should be installed and running.

Pre-Migration Steps:

->Make sure that CRS services are running.

->Install new Oracle Home on both the server because CRS is running so new Oracle Home will have RAC binaries.

If single instance binary is installed then convert it to RAC binaries:

To activate RAC binary

->cd $ORACLE_HOME/rdbms/lib

->make –f ins_rdbms.mk RAC_ON

->relink all to link binaries

Migration Steps

->Startup DB as single instance

-> Add new redo logs groups for 2^nd instance and enable 2^nd redo thread. ALTER DATABASE ENABLE THREAD 2;

->Add new undo TBS for the 2^nd instance.

->Create new parameter file for RAC on both the instances on both the nodes

CLUSTER_DATABASE=TRUE

CLUSTER_DATABASE_INSTANCES=2

RAC1.THREAD=1, RAC2.THREAD=2

RAC1.UNDOTABLESPACE=UNDOTBS1

RAC2.UNDOTABLESPACE=UNDOTBS2

RAC1.INSTANCE_NUMBER=1, RAC2.INSTANCE_NUMBER=2

*.remote_listener='rac-scan:1521

RAC1.log_archive_format='rac1_%t_%s_%r.dbf'

RAC2.log_archive_format='rac2_%t_%s_%r.dbf'

RAC1.LOCAL_LISTENER=VIP1, RAC2.LOCAL_LISTENER=VIP2

->Create SPFILE from pfile on shared location and rename the pfile on the both the node and add spfile location path to the pfile to point spfile

Post Migration Steps

->Shutdown the single instance and startup the RAC instance one by one.

->Execute catclust.sql script

->Register DB and instance into the cluster

srvctl add datatabase –d <db_name> -o $ORACLE_HOME

sevctl add instance –d <db_name> -I instance_name –n node_name

To prevent Oracle Clusterware from restarting your Oracle RAC database when you restart your system, or to avoid restarting failed instances ?

srvctl modify database -db db_unique_name -policy [AUTOMATIC | MANUAL | NORESTART] -oraclehome $ORACLE_HOME -dbname DATA

What UNIX parameters you will set while Oracle installation?

shmmax, shmmni, shmall, sem,

Difference between CPU & PSU patches?

CPU - Critical Patch Update - includes only Security related patches.

PSU - Patch Set Update - includes CPU + other patches deemed important enough to be released prior to a minor (or major) version release.

What is the use of root.sh & oraInstRoot.sh?

Changes ownership & permissions of oraInventory

Creating oratab file in the /etc directory

In RAC, starts the clusterware stack.

What is OCR file?

RAC configuration information repository that manages information about the cluster node list and instance-to-node mapping information.

The OCR also manages information about Oracle Clusterware resource profiles for customized applications. Maintains cluster configuration information as well as configuration information about any cluster database within the cluster.The OCR must reside on shared disk that is accessible by all of the nodes in your cluster. The daemon OCSSd manages the configuration info in OCR and maintains the changes to cluster in the registry.

What is the use of virtual IP?

When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.

What is the major difference between 10g and 11g RAC?

Well, there is not much difference between 10g and 11gR (1) RAC.But there is a significant difference in 11gR2.

Prior to 11gR1(10g) RAC, the following were managed by Oracle CRS

Databases, Instances, Applications, Node Monitoring, Event Services, High Availability

From 11gR2(onwards) its completed HA stack managing and providing the following resources as like the other cluster software like VCS etc.

Databases, Instances, Applications, Cluster Management

Node Management, Event Services, High Availability

Network Management (provides DNS/GNS/MDNSD services on behalf of other traditional services) and SCAN – Single Access Client,Naming method, HAIP, Storage Management (with help of ASM and other new ACFS filesystem), Time synchronization (rather depending upon traditional NTP), Removed OS dependent hang checker etc, manages with own additional monitor process.

Oracle 11g R1 RAC

->Oracle 11g RAC parallel upgrades - Oracle 11g have rolling upgrade features whereby RAC database can be upgraded without any downtime.

->Hot patching - Zero downtime patch application.

->Oracle RAC load balancing advisor - Starting from 10g R2 we have RAC load balancing advisor utility. 11g RAC load balancing advisor is only available with clients who use .NET, ODBC, or the Oracle Call Interface (OCI).

->ADDM for RAC - Oracle has incorporated RAC into the automatic database diagnostic monitor, for cross-node advisories. The script addmrpt.sql run give report for single instance, will not report all instances in RAC, this is known as instance ADDM. But using the new package DBMS_ADDM, we can generate report for all instances of RAC, this known as database ADDM.

->Optimized RAC cache fusion protocols - moves on from the general cache fusion protocols in 10g to deal with specific scenarios where the protocols could be further optimized.

->Oracle 11g RAC Grid provisioning - The Oracle grid control provisioning pack allows us to "blow-out" a RAC node without the time-consuming install, using a pre-installed "footprint".

Oracle 11g R2 RAC

->We can store everything on the ASM. We can store OCR & voting files also on the ASM.

->ASMCA

->Single Client Access Name (SCAN)

->Clusterware components: crfmond, crflogd, GIPCD.

->AWR is consolidated for the database.

->11g Release 2 Real Application Cluster (RAC) has server pooling technologies so it’s easier to provision and manage database grids. This update is geared toward dynamically adjusting servers as corporations manage the ebb and flow between data requirements for datawarehousing and applications.

->By default, LOAD_BALANCE is ON.

->GSD (Global Service Deamon), gsdctl introduced.

->GPnP profile.

->Cluster information in an XML profile.

->Oracle RAC OneNode is a new option that makes it easier to consolidate databases that aren’t mission critical, but need redundancy.

->raconeinit - to convert database to RacOneNode.

->raconefix - to fix RacOneNode database in case of failure.

->racone2rac - to convert RacOneNode back to RAC.

->Oracle Restart - the feature of Oracle Grid Infrastructure's ->High Availability Services (HAS) to manage associated listeners, ASM instances and Oracle instances.

->Oracle Omotion - Oracle 11g release2 RAC introduces new feature called Oracle Omotion, an online migration utility. This Omotion utility will relocate the instance from one node to another, whenever instance failure happens.

Omotion utility uses Database Area Network (DAN) to move Oracle instances. Database Area Network (DAN) technology helps seamless database relocation without losing transactions.

->Cluster Time Synchronization Service (CTSS) is a new feature in Oracle 11g R2 RAC, which is used to synchronize time across the nodes of the cluster. CTSS will be replacement of NTP protocol.

->Grid Naming Service (GNS) is a new service introduced in Oracle RAC 11g R2. With GNS, Oracle Clusterware (CRS) can manage Dynamic Host Configuration Protocol (DHCP) and DNS services for the dynamic node registration and configuration.

->Cluster interconnect: Used for data blocks, locks, messages, and SCN numbers.

->Oracle Local Registry (OLR) - From Oracle 11gR2 "Oracle Local Registry (OLR)" something new as part of Oracle Clusterware. OLR is node’s local repository, similar to OCR (but local) and is managed by OHASD. It pertains data of local node only and is not shared among other nodes.

->Multicasting is introduced in 11gR2 for private interconnect traffic.

->I/O fencing prevents updates by failed instances, and detecting failure and preventing split brain in cluster. When a cluster node fails, the failed node needs to be fenced off from all the shared disk devices or diskgroups. This methodology is called I/O Fencing, sometimes called Disk Fencing or failure fencing.

->Re-bootless node fencing (restart)? - instead of fast re-booting the node, a graceful shutdown of the stack is attempted.

->Clusterware log directories: acfs*

->HAIP (IC VIP).

->Redundant interconnects: NIC bonding, HAIP.

->RAC background processes: DBRM – Database Resource Manager, PING – Response time agent.

->Virtual Oracle 11g RAC cluster - Oracle 11g RAC supports virtualization.

What are Oracle Cluster Components?

Cluster Interconnect (HAIP),Shared Storage (OCR/Voting Disk),Clusterware software.

What are Oracle RAC Components?

VIP, Node apps etc.

What is GNS?

Grid Naming service is alternative service to DNS , which will act as a sub domain in your DNS but managed by Oracle,with GNS the connection is routed to the cluster IP and manages internally.

ADD/REMOVE/REPLACE/MOVE OCR Device

Note: You must be logged in as the root user, because root owns the OCR files. "ocrconfig -replace" command can only be issued when CRS is running,otherwise "PROT-1: Failed to initialize ocrconfig" will occur.

Please ensure CRS is running on ALL cluster nodes during this operation, otherwise the change will not reflect in the CRS down node, CRS will have problem to startup from this down node. "ocrconfig -repair" option will be required to fix the ocr.loc file on the CRS down node.

To add an OCRMIRROR device

# ocrconfig -add +OCRVOTE2

To remove an OCR device

# ocrconfig -delete +OCRVOTE1

* Once an OCR device is removed, ocrmirror device automatically changes to be OCR device.

* It is not allowed to remove OCR device if only 1 OCR device is defined, the command will return PROT-16.

To replace or move the location of an OCR device

Note. 1. An ocrmirror must be in place before trying to replace the OCR device. The ocrconfig will fail with PROT-16, if there is no ocrmirror exists.

2. If an OCR device is replaced with a device of a different size, the size of the new device will not be reflected until the clusterware is restarted.

(at least 2 OCR exist for replace command to work):

# ocrconfig -replace +CRS -replacement +OCRVOTE

To backup OCR

#ocrconfig -showbackup

# ocrconfig -manualbackup

How to restore OCR from automatic backup?

->Stop oracle clusterware on each node AS ROOT.

crsctl stop crs .

->Identify the most recent backup .

ocrconfig – showbackup .

->Restore the backup file .

ocrconfig – restore $ORA_CRS_HOME/cdata/crs/backup00.ocr .

->Now verify the restore operation .

cluvfy comp ocr –n all .

->Now restart Oracle clusterware on each node .

crsctl start crs .

To restore an OCR when clusterware is down

When OCR is not accessible, CRSD process will not start, hence the clusterware stack will not start completely. A restore of OCR device access and good OCR content is required.

To view the automatic OCR backup:

# ocrconfig -showbackup

To restore the OCR backup:

# ocrconfig -restore <path/filename of OCR backup>

To move voting disk on ASM from one DG to another DG due to redundancy change or disk location change

->create new diskgroup +CRS as desired

->Shut down clusterware on all the nodes before making any modification to VD.

crsctl stop crs .

crsctl start crs –excl ( #start in exclusive mode).

crsctl add css votedisk +ASM_DG32 .

crsctl replace votedisk + ASM_DG32 .

crsctl query css votedisk .

ADD/DELETE/MOVE Voting Disk

For 11.2+, it is no longer required to back up the VD. The VD data is automatically backed up in OCR as part of any configuration change. The VD files are backed up automatically by Oracle Clusterware if the contents of the files have changed in the following ways:

->Configuration parameters, for example misscount, have been added or modified

->After performing voting disk add or delete operations

The voting disk contents are restored from a backup automatically when a new voting disk is added or replaced.

1. To add a VD

a. When VD is on cluster file system:

$ crsctl add css votedisk <cluster_fs/filename>

b. When VD is on ASM diskgroup, no add option available.

The number of votedisk is determined by the diskgroup redundancy. If more copies of votedisks are desired, one can move votedisk to a diskgroup with higher

redundancy.

2. To delete a VD

a. When votedisk is on cluster file system:

$ crsctl delete css votedisk <cluster_fs/filename>

$ crsctl delete css votedisk <vdiskGUID>

How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix

->Locate the latest automatic OCR backup

$ ls -lrt $CRS_HOME/cdata/rac_cluster1/

->Make sure the Grid Infrastructure is shutdown on all nodes

# $CRS_HOME/bin/crsctl stop crs -f

->Start the CRS stack in exclusive mode

# $CRS_HOME/bin/crsctl start crs -excl -nocrs

IMPORTANT:

A new option '-nocrs' has been introduced with 11.2.0.2, which prevents the start of the ora.crsd resource. It is vital that otherwise the failure to start the ora.crsd resource will tear down ora.cluster_interconnect.haip,

->Create the CRS diskgroup

->Restore the latest OCR backup

# cd $CRS_HOME/cdata/rac_cluster1/

# $CRS_HOME/bin/ocrconfig -restore backup00.ocr

->Recreate the Voting file

# $CRS_HOME/bin/crsctl replace votedisk +CRS

->Recreate the SPFILE for ASM

->Shutdown CRS

Since CRS is running in exclusive mode, it needs to be shutdown to allow CRS to run on all nodes again.

# $CRS_HOME/bin/crsctl stop crs -f

->Start CRS

As the root user submit the CRS startup on all cluster nodes:

# $CRS_HOME/bin/crsctl start crs

->Verify CRS

# $CRS_HOME/bin/crsctl check cluster -all

# $CRS_HOME/bin/crsctl status resource –t

LMS processes taking up lot of memory on the server

GCS processing

Latency of the GCS can be contributed by CPU starvation, Memory starvation,IPC latencies,LMS configuration

In case of LMS congestion that is, when LMS cannot dequeue messages fast enough look at the GCS_SERVER_PROCESSES

In 10g GCS_SERVER_PROCESSES is 2 by default. In 11g for one CPU, there is one GCS server process. For 2 – 8 CPUs, there will be 2 GCS server processes. For more than 8 CPUs, the number of GCS server processes will be equal to the number of CPUs divided by 4. If the result includes a fraction, ignore the fraction. For example, if you had 10 CPUs, then 10/4 would mean 2 GCS processes.

WHAT IS GPNP PROFILE?

The GPnP profile is a small XML file located in $GRID_HOME/gpnp/hostname/profiles/peer under the name profile.xml. Each node maintains a local copy of the GPnP Profile and is maintanied by the GPnP Deamon

To start clusterware, voting disk needs to be accessed. If voting disk is on ASM, this information (that voting disk is on ASM) is read from GPnP profile (<orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/>). The voting disk is read using kfed utility even if ASM is not up.Next, the clusterware checks if all the nodes have the updated GPnP profile and the node joins the cluster based on the GPnP configuration .

The order of searching the ASM SPfile is

1. GPnP profile

2. ORACLE_HOME/dbs/spfilesid.ora

3. ORACLE_HOME/dbs/initsid.ora

What is Cache Fusion?

Transferring of data between RAC instances by using private network. Cache Fusion is the remote memory mapping of Oracle buffers,shared between the caches of participating nodes in the cluster.When a block of data is read from datafile by an instance within the cluster and another instance is in need of the same block, it is easy to get the block image from the instance which has the block in its SGA rather than reading from the disk.

What is global cache service monitoring?

Global Cache Services (GCS) Monitoring

The use of the GCS relative to the number of buffer cache reads, or logical reads can be estimated by dividing the sum of GCS requests (global cache gets + global cache converts + global cache cr blocks received + global cache current blocks received ) by the number of logical reads (consistent gets + db block gets ) for a given statistics collection interval.

A global cache service request is made in Oracle when a user attempts to access a buffer cache to read or modify a data block and the block is not in the local cache.

A remote cache read, disk read or change access privileges is the inevitable result. These are logical read related. Logical reads form a superset of the global cache service operations.

What is fencing?

I/O fencing prevents updates by failed instances, and detecting failure and preventing split brain in cluster.

When a cluster node fails, the failed node needs to be fenced off from all the shared disk devices or diskgroups.

This methodology is called I/O Fencing, sometimes called Disk Fencing or failure fencing.

FAN (Fast Application Notification)

FAN is a notification mechanism that RAC uses to notify other processes about configuration and service level

information such as includes service status changes, such as UP or DOWN events. Applications can respond to FAN events and take immediate action. FAN UP and DOWN events can apply to instances, services, and nodes.

FAN also publishes load balancing advisory (LBA) events. Applications are in a position to take full advantage of the LBA FAN events to facilitate a smooth transition of connections to healthier nodes in a cluster.

ONS (Oracle Notification Services)

ONS allows users to send SMS messages, e-mails, voice notifications, and fax messages in an easy-to-access manner.Oracle Clusterware uses ONS to send notifications about the state of the database instances to mid-tier applications that use this information for load-balancing and for fast failure detection. ONS is a daemon process that communicates with other ONS daemons on other nodes which inform each other of the current state of the database components on the database server.

The ONS configuration file is located in the $ORACLE_HOME/opmn/conf directory

Transparent Application Failover

TAF is transparent application failover which will move a session to a backup connection if the session fails. With Oracle 10g Release 2, you can define the TAF policy on the service using dbms_service package. It will only work with OCI clients. It will only move the session and if the parameter is set, it will failover the select statement. For insert, update or delete transactions, the application must be TAF aware and roll back the transaction. You should enable FCF on your OCI client when you use TAF, it will make the failover faster.

Note: TAF will not work with JDBC thin

What are various IPs used in RAC? Or How may IPs we need in RAC?

Public IP, Private IP, Virtual IP, SCAN IP

How many SCAN listeners will be running?

Three SCAN listeners only.

What is FCF?

Fast Connection Failover provides high availability to FAN integrated clients, such as clients that use JDBC, OCI, or ODP.NET. If you configure the client to use fast connection failover, then the client automatically subscribes to FAN events and can react to database UP and DOWN events.

In response, Oracle gives the client a connection to an active instance that provides the requested database service.

What are nodeapps?

VIP, listener, ONS, GSD

How to find out the nodes in cluster (or) how to find out the master node?

# olsnodes -- Which ever displayed first, is the master node of the cluster.

select MASTER_NODE from V$GES_RESOURCE;

To find out which is the master node, you can see ocssd.log file and search for "master node number".

grep crsd log file

# /u1/app/../crsd>grep MASTER crsd.log | tail -1

(or)

cssd >grep -i "master node" ocssd.log | tail -1

How to know the public IPs, private IPs, VIPs in RAC?

# olsnodes -n -p -i

node1-pub 1 node1-prv node1-vip

node2-pub 2 node2-prv node2-vip

How to monitor block transfer interconnects nodes in rac

The v$cache_transfer and v$file_cache_transfer views are used to examine RAC statistics.

The types of blocks that use the cluster interconnects in a RAC environment are monitored with the v$ cache transfer series of views:

v$cache_transfer: This view shows the types and classes of blocks that Oracle transfers over the cluster interconnect on a per-object basis.

The forced_reads and forced_writes columns can be used to determine the types of objects the RAC instances are sharing.

Values in the forced_writes column show how often a certain block type is transferred out of a local buffer cache due to the current version being requested by another instance.

Upgrade Clusterware ,ADD/RMOVE Node

->Install patchset software only ,installation is out of place.

->Apply PSU patches if required

->run rootupgrade.sh to upgrade clusterware, it will detect the running clusterwar e and upgrade the clusterware.

->AddNode.sh to add node

->DeleteNode.sh to remove node

Amit's Oracle DBA Blog

Disclaimer

Thursday, 2 September 2021

REAL APPLICATION CLUSTER

No comments:

Post a Comment

Oracle Exadata

Labels