REAL APPLICATION CLUSTER:-
BG Process
Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that
enable cache fusion. Oracle RAC instances are composed of following background
processes:
GTX0-j: The process provides transparent support for XA
global transactions in a RAC environment. The database auto tunes the number of
these processes based on the workload of XA global transactions.
LMON: Global Enqueue Service Monitor This
process monitors global enqueues and resources across the cluster and performs
global enqueue recovery operations.
LMD:Global enqueue service daemon This
process manages incoming remote resource requests within each instance.
LMS:
Global
Cache service process This process maintains statuses of datafiles and
each cached block by recording information in a Global Resource Directory
(GRD).This process also controls the flow of messages to remote instances and
manages global data block access and transmits block images between the buffer
caches of different instances. This processing is a part of cache fusion
feature.
LCK0:Instance enqueue process This
process manages non-cache fusion resource requests such as library and row
cache requests.
RMSn: Oracle RAC management process These
processes perform manageability tasks for Oracle RAC.Tasks include creation of
resources related Oracle RAC when new instances are added to the cluster.
RSMN: Remote Slave Monitor This process
manages background slave process creation and communication on remote
instances. This is a background slave process. This process performs tasks on
behalf of a co-coordinating process running in another instance.
Major RAC wait events
In a RAC environment the buffer cache is
global across all instances in the cluster and hence the processing differs.
The most common wait events related to this are gc cr request and gc buffer
busy
GC CR request: the time it takes to retrieve the data
from the remote cache
Reason: RAC Traffic Using Slow Connection or
Inefficient queries (poorly tuned queries will increase the amount of data
blocks requested by an Oracle session. The more blocks requested typically
means the more often a block will need to be read from a remote instance via
the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally
spends accessing the requested data block.
What are the background process that exists
in 11gr2 and functionality?
CRSD
The CRS daemon (crsd)
manages cluster resources based on configuration information that is stored in
Oracle Cluster Registry (OCR) for each resource. This includes start, stop,
monitor, and failover operations. The crsd process generates events when the
status of a resource changes.
CSSD
Cluster Synchronization
Service (CSS): Manages the cluster configuration by controlling which nodes are
members of the cluster and by notifying members when a node joins or leaves the
cluster. If you are using certified third-party clusterware, then CSS processes
interfaces with your clusterware to manage node membership information. CSS has
three separate processes: the CSS daemon (OCSSD), the CSS Agent (cssdagent),
and the CSS Monitor (cssdmonitor). The cssdagent process monitors the cluster
and provides input/output fencing. This service formerly was provided by Oracle
Process Monitor daemon (oprocd. A cssdagent failure results in Oracle
Clusterware restarting the node.
Diskmon
Disk Monitor daemon (diskmon): Monitors and performs input/output fencing for Oracle Exadata Storage
Server. As Exadata storage can be added to any Oracle RAC node at any point in
time, the diskmon daemon is always started when OCSSD is started.
Evmd
Event Manager (EVM): Is a
background process that publishes Oracle Clusterware events
mdnsd Multicast domain name service (mDNS):
Allows DNS requests. The mDNS process is a background process on Linux and
UNIX, and a service on Windows.
Gnsd
Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS and external DNS servers. The GNS
process performs name resolution within the cluster.
Ons
Oracle Notification Service (ONS): Is a publish-and-subscribe service for communicating Fast Application
Notification (FAN) events
Oraagent: Extends clusterware to support
Oracle-specific requirements and complex resources. It runs server callout
scripts when FAN events occur. This process was known as RACG in Oracle
Clusterware 11g Release 1 (11.1).
Oracle
root agent (orarootagent):
Is a specialized oraagent process that helps CRSD manage resources owned by
root, such as the network, and the Grid virtual IP address
oclskd
Cluster kill daemon (oclskd):
Handles instance/node evictions requests that have been escalated to CSS
Grid IPC daemon (gipcd): Is a helper daemon for the communications
infrastructure
CTSSD Cluster time synchronisation daemonto
manage the time syncrhonization between nodes, rather depending on NTP.
VIP: If a node fails, then the node's VIP
address fails over to another node on which the VIP address can accept TCP
connections but it cannot accept Oracle connections.
VD: Oracle Clusterware uses the VD to
determine which instances are members of a cluster. Basically all nodes in the
RAC cluster register heart-beat information on the VD. The number decides the
number of active nodes in the RAC cluster. These are also used for checking the
availability of instances in RAC and remove unavailable nodes out of the
cluster. It help in preventing split-brain condition and keep database
information intact.
Oracle
recommends that you have minimum 2 VDs.
Node
should see maximum number of VD to continue to function, so with 2 it can see
only 1, its not maximum number but its half value of VD.
The
CSS daemon in the
clusterware maintains the heart beats to the VD.
Split Brain Syndrome
Split brain syndrome occurs when the
instance members in a RAC fail to ping/connect each other via private
interconnect but the servers are physically up and running and database
instance on each of these servers is also running. These individual nodes are
running fine and can conceptually accept the user connections and work
independently. So basically due to lack of communication the instance thinks
that the other instance that it is not able to connect is down and it needs to
do something about the situation. The problem is if we leave these instances
running ,the same block might get read, updated in the individual instances and
there would be data integrity issue, as the block changed by one instance will
not be locked and could be over-written by another instance.
Node Eviction
In RAC if any node become inactive or if
other nodes are unable to ping/connect to a node in the RAC, then the node
which first detect that one of the node is not accessible ,it will evict that
node from the RAC group. When we see that the node is evicted , usually Oracle
rac will reboot that node and try to do a cluster reconfiguration to include
back the evicted node. You will see the Oracle error ORA-29740 when there is a
node eviction in RAC.
Reasons for Node Reboot or Node Eviction
Whenever, Database Administrator face Node
Reboot issue, First thing to look at should be /var/log/message and OS Watcher logs of the Database Node which
was rebooted.
var/log/messages will give you an actual picture of reboot:-
Exact time of restart, status of resource like swap and RAM etc.
1. High
Load on Database Server: High load on the system was reason for Node
Evictions, One common scenario is due to high load RAM and SWAP space of DB
node got exhaust and system stops working and finally reboot.
So, Every time you see a node eviction start
investigation with /var/log/messages and Analyze OS Watcher logs.
2. Voting
Disk not Reachable: One of the another reason for Node Reboot is
clusterware is not able to access a minimum number of the voting files. When
the node aborts for this reason, the node
alert log will show CRS-1606 error.
There could be two reasons for this issue:
A. Connection to the voting disk is interrupted.
B. if only one voting disk is in use and version is less
than 11.2.0.3.4, hitting known bug
13869978.
How to Solve Voting Disk Outage ?
There could be many reasons for voting disk
is not reachable, Here are few general approach for DBA to follow.
1. Use command "crsctl query css votedisk" on a node where clusterware is up
to get a list of all the voting files.
2. Check that each node can access the
devices underlying each voting file.
3. Check for permissions to each voting
file/disk have not been changed.
4. Check OS, SAN, and storage logs for any
errors from the time of the incident.
5. Apply fix for 13869978 if only one voting
disk is in use. This is fixed in 11.2.0.3.4 patch set and above, and 11.2.0.4
and above
3.
Missed Network Connection between Nodes: In technical term this is called as Missed Network
Heartbeat (NHB). Whenever there is communication gap or no communication
between nodes on private network (interconnect) due to network outage or some
other reason. A node abort itself to avoid "split brain" situation.
The most common (but not exclusive) cause of missed NHB is network problems
communicating over the private interconnect.
Suggestion to troubleshoot Missed Network
Heartbeat.
1. Check OS statistics from the evicted node
from the time of the eviction. DBA can use OS Watcher to look at OS Stats at
time of issue, check oswnetstat and oswprvtnet for network related issues.
2. Validate the interconnect network setup
with the Help of Network administrator.
3. Check communication over the private
network.
4. Check that the OS network settings are
correct by running the RACcheck tool.
4. Database
Or ASM Instance Hang: Sometimes Database or ASM instance hang can cause Node
reboot. In these case Database instance is hang and is terminated afterwards,
which cause either reboot cluster or Node eviction. DBA should check alert log
of Database and ASM instance for any hang situation which might cause this
issue.
11gR2
Changes –> Important, in
11GR2, the fencing (eviction) does not to reboot.
Until Oracle Clusterware 11.2.0.2, fencing
(eviction) meant “re-boot”
With Oracle Clusterware 11.2.0.2, re-boots
will be seen less, because:
–
Re-boots affect applications that might run an a node, but are not protected
–
Customer requirement: prevent a reboot; just stop the cluster – implemented…
With Oracle Clusterware 11.2.0.2, re-boots
will be seen less: Instead of fast re-booting the node, a graceful shutdown of
the cluster stack is attempted
Different
platforms maintain logs at different locations, as shown in the following
example:
Kernel
syslog file /var/log/syslog/*
->OS or vendor clusterware events,
including node reboot times
HPUX - /var/adm/syslog/syslog.log
AIX - /bin/errpt –a
Linux - /var/log/messages
Windows - Refer .TXT log files under Application/System log
using Windows Event Viewer
Solaris - /var/adm/messages
Trace
File 11gR2 Trace Location Description
$GRID_HOME/log/<hostname>/
/ cssd, /crsd, /evmd, /ohasd, /mdnsd,/gpnpd,/diskmon,
/ctssd,
CSS
logs->Oracle Clusterware
cluster synchronization services (CSS)To get more tracing, issue `crsctl set
css trace 2` as root
CRS
logs->Oracle Clusterware
cluster ready services (CRS) management and HA policy tracing; one log per RAC
host
EVM
logs->Oracle Clusterware
event management layer tracing; one log per RAC host
OHAS
logs <CRSHome>/log/<hostname>/ohasd
-> one log per RAC host
MDNS
logs->Multicast Domain
Name Service Daemon; one log per RAC host.
GPNP
logs-> one log per RAC
host.
DISKMON
logs ->Disk Monitor
daemon; one log per RAC host.
CTSS
logs-> one log per RAC
host also generate alert in GRID log
file.
SRVM
logs <CRSHome>/srvm/log OCR tracing; to get more tracing, edit
mesg_logging_level in srvm/admin/ocrlog.ini file; one log per RAC host
CRSD
oraagent Logs<CRSHome>/log/<hostname>/agent/crsd/oraagent_<crs
admin username>
->Oracle agent (oraagent), which was
known as RACG in 11R1.
CRSD
orarootagent Logs<CRSHome>/log/<hostname>/agent/crsd/orarootagent_<crs
admin username>
->Oracle root agent (orarootagent), which
manages crsd resources owned by root, such as network, vip and scan vip
Clusterware Software Stack
Beginning with Oracle 11gR2, Oracle
redesigned Oracle Clusterware into two software stacks: the High Availability
Service stack and CRS stack. Each of these
stacks consists of several background processes. The processes of these two
stacks facilitate the Clusterware
ASM and Clusterware: Which One is Started
First?
If you have used Oracle RAC 10g and 11gR1,
you might remember that the Oracle Clusterware stack has to be up
before the ASM instance starts on the node.
Because 11gR2, OCR, and VD also can be stored in ASM.
ASM
is a part of the CRS of the Clusterware and it is started at Level 3 after the
high availability stack is started and before CRSD is started. During the
startup of the high availability stack, the Oracle Clusterware gets the
clusterware configuration from OLR and the GPnP profile instead of from OCR.
Because these two components are stored in the $GRID_HOME in the local disk,
the ASM instance and ASM diskgroup are not needed for the startup of the high
availability stack. Oracle Clusterware also doesn’t rely on an ASM instance to
access the VD. The location of the VD file is in the ASM disk
header. We can see the location information
with the following command:
$ kfed read /dev/dm-8 | grep -E
'vfstart|vfend'
ORACLE LOCAL REGISTRY
An additional cluster configuration file has
been introduced with Oracle 11.2, the so-called Oracle Local Registry (OLR).
Each node has its own copy of the file in the Grid Infrastructure software
home. The OLR stores important security contexts used by the Oracle High
Availability Service early in the start sequence of clusterware. The
information in the OLR and the Grid Plug and Play configuration file are needed
to locate the voting disks.The information stored in the OLR is needed by the
Oracle High Availability Services daemon (OHASD) to start; this includes data
about GPnP wallets, Clusterware configuration, and version information. If they
are stored in ASM, the discovery string in the GPnP profile will be used by the
cluster synchronization daemon to look them up. Later in the Clusterware boot
sequence, the ASM instance will be started by the cssd process to access the
OCR files; however, their location is stored in the /etc/ocr.loc file Of course, if the voting files and OCR are on
a shared cluster file system, then an ASM instance is not needed and won’t be
started unless a different resource depends on ASM.In contrast, the OCR is used
extensively by CRSD. The OLR is maintained by the same command-line utilities
as the OCR, with the appended -local option.
Clusterware Startup Sequence
Level
0: The OS automatically
starts Clusterware through the OS’s init process. The init process spawns only
one init.ohasd, which in turn starts the
OHASD process. This is configured in the /etc/inittab file:
Once OHASD is started on Level 0, OHASD is
responsible for starting the rest of the Clusterware and the resources
that Clusterware manages directly or
indirectly through Levels 1-4.
Level 1: OHASD
directly spawns four agent processes:
• cssdmonitor: CSS Monitor
• OHASD orarootagent: High Availability
Service stack Oracle root agent
• OHASD oraagent: High Availability Service
stack Oracle agent
• cssdagent: CSS Agent
Level 2: On
this level, OHASD oraagent spawns five processes:
• mDNSD: mDNS daemon process
• GIPCD: Grid Interprocess Communication
• GPnPD: GPnP profile daemon
• EVMD: Event Monitor daemon
• ASM: Resource for monitoring ASM instances
Then, OHASD oraclerootagent spawns the
following processes:
• CRSD: CRS daemon
• CTSSD: CTSS daemon
• Diskmon: Disk Monitor daemon (Exadata
Storage Server storage)
• ACFS: (ASM Cluster File System) Drivers
Next, the cssdagent starts the CSSD (CSS
daemon) process.
Level 3: The
CRSD spawns two CRSD agents: CRSD orarootagent and CRSD oracleagent.
Level 4: On
this level, the CRSD orarootagent is responsible for starting the following
resources:
• Network resource: for the public network
• SCAN VIPs
• Node VIPs: VIPs for each node
• ACFS Registry
• GNS VIP: VIP for GNS if you use the GNS
option
Then, the CRSD orarootagent is responsible
for starting the rest of the resources as follows:
• ASM Resource: ASM Instance(s) resource
• Diskgroup: Used for managing/monitoring
ASM diskgroups.
• DB Resource: Used for monitoring and
managing the DB and instances
• SCAN listener: Listener for SCAN listening
on SCAN VIP
• SCAN VIP: Single Client Access Name VIP
• Listener: Node listener listening on the
Node VIP
• Services: Database services
• ONS
• eONS: Enhanced ONS
• GSD: For 9i backward compatibility
• GNS (optional): performs name resolution
How
connection load balancing works using SCAN?
For clients connecting using Oracle SQL*Net
11g Release 2, three IP addresses will be received by the client by resolving
the SCAN name through DNS. The client will then go through the list it receives
from the DNS and try connecting through one of the IPs received. If the client
receives an error, it will try the other addresses before returning an error to
the user or application. This is similar to how client connection failover
works in previous releases when an address list is provided in the client
connection string. When a SCAN Listener receives a connection request, the SCAN
Listener will check for the least loaded instance providing the requested
service. It will then re-direct the connection request to the local listener on
the node where the least loaded instance is running. Subsequently, the client
will be given the address of the local listener. The local listener will
finally create the connection to the database instance
Oracle Database 11g R2 RAC - The top 5
changes
With the release of Oracle Database 11g R2 a
fundamental change was made that impacts a RAC based system. The cause of these
changes is the introduction of the Oracle Grid Infrastructure product, which
essentially combines
ASM and Oracle Clusterware into the same
oracle home and product. ASM was previously incorporated into the Oracle
Database home and binaries, and Oracle Clusterware was a stand-alone product
installed with its own home and binaries.
With this new configuration, five main
changes have been incorporated into Oracle Database 11g RAC. These changes
include installing and working with Grid Infrastructure, which includes
setting up and configuring ASM; Single
Client Access Names (SCAN), RAC One Node, Automatic Workload Balancing
and the
ASM Cluster File System (ACFS).
Non-RAC To RAC Conversion
Pre-condition:
->All the server setup should be ready.
->Shared disk should be available.
->CRS software should be installed and
running.
Pre-Migration Steps:
->Make sure that CRS services are
running.
->Install new Oracle Home on both the
server because CRS is running so new Oracle Home will have RAC binaries.
If single instance binary is installed then
convert it to RAC binaries:
To activate RAC binary
->cd $ORACLE_HOME/rdbms/lib
->make
–f ins_rdbms.mk RAC_ON
->relink
all to link binaries
Migration Steps
->Startup DB as single instance
-> Add new redo logs groups for 2nd
instance and enable 2nd redo thread. ALTER DATABASE ENABLE THREAD
2;
->Add new undo TBS for the 2nd
instance.
->Create new parameter file for RAC on
both the instances on both the nodes
CLUSTER_DATABASE=TRUE
CLUSTER_DATABASE_INSTANCES=2
RAC1.THREAD=1, RAC2.THREAD=2
RAC1.UNDOTABLESPACE=UNDOTBS1
RAC2.UNDOTABLESPACE=UNDOTBS2
RAC1.INSTANCE_NUMBER=1, RAC2.INSTANCE_NUMBER=2
*.remote_listener='rac-scan:1521
RAC1.log_archive_format='rac1_%t_%s_%r.dbf'
RAC2.log_archive_format='rac2_%t_%s_%r.dbf'
RAC1.LOCAL_LISTENER=VIP1, RAC2.LOCAL_LISTENER=VIP2
->Create SPFILE from pfile on shared
location and rename the pfile on the both the node and add spfile location path
to the pfile to point spfile
Post Migration Steps
->Shutdown the single instance and
startup the RAC instance one by one.
->Execute catclust.sql script
->Register DB and instance into the
cluster
srvctl add datatabase –d <db_name> -o
$ORACLE_HOME
sevctl add instance –d <db_name> -I
instance_name –n node_name
To prevent Oracle Clusterware from restarting
your Oracle RAC database when you restart your system, or to avoid restarting
failed instances ?
srvctl modify database -db db_unique_name -policy [AUTOMATIC | MANUAL | NORESTART]
-oraclehome $ORACLE_HOME -dbname DATA
What UNIX parameters you will set while
Oracle installation?
shmmax, shmmni, shmall, sem,
Difference between CPU & PSU patches?
CPU - Critical Patch Update - includes only
Security related patches.
PSU - Patch Set Update - includes CPU + other
patches deemed important enough to be released prior to a minor (or major)
version release.
What is the use of root.sh &
oraInstRoot.sh?
Changes ownership & permissions of
oraInventory
Creating oratab file in the /etc directory
In RAC, starts the clusterware stack.
What is OCR file?
RAC configuration information repository
that manages information about the cluster node list and instance-to-node
mapping information.
The OCR also manages information about
Oracle Clusterware resource profiles for customized applications. Maintains
cluster configuration information as well as configuration information about
any cluster database within the cluster.The OCR must reside on shared disk that
is accessible by all of the nodes in your cluster. The daemon OCSSd manages the
configuration info in OCR and maintains the changes to cluster in the registry.
What is the use of virtual IP?
When a node fails, the VIP associated with
it is automatically failed over to some other node and new node re-arps the
world indicating a new MAC address for the IP. Subsequent packets sent to the
VIP go to the new node, which will send error RST packets back to the clients. This
results in the clients getting errors immediately.
Without using VIPs or FAN, clients connected
to a node that died will often wait for a TCP timeout period (which can be up
to 10 min) before getting an error. As a result, you don't really have a good
HA solution without using VIPs.
What is the major difference between 10g and
11g RAC?
Well, there is not much difference between
10g and 11gR (1) RAC.But there is a significant difference in 11gR2.
Prior to 11gR1(10g) RAC, the following were managed by Oracle CRS
Databases,
Instances, Applications, Node Monitoring, Event Services, High Availability
From 11gR2(onwards) its completed HA stack managing and providing
the following resources as like the other cluster software like VCS etc.
Databases, Instances, Applications, Cluster
Management
Node Management, Event Services, High
Availability
Network Management (provides DNS/GNS/MDNSD
services on behalf of other traditional services) and SCAN – Single Access
Client,Naming method, HAIP, Storage Management (with help of ASM and other new
ACFS filesystem), Time synchronization (rather depending upon traditional NTP),
Removed OS dependent hang checker etc, manages with own additional monitor
process.
Oracle 11g
R1 RAC
->Oracle
11g RAC parallel upgrades - Oracle 11g have rolling upgrade features whereby
RAC database can be upgraded without any downtime.
->Hot
patching - Zero downtime patch application.
->Oracle
RAC load balancing advisor - Starting from 10g R2 we have RAC load balancing
advisor utility. 11g RAC load balancing advisor is only available with clients
who use .NET, ODBC, or the Oracle Call Interface (OCI).
->ADDM
for RAC - Oracle has incorporated RAC into the automatic database diagnostic
monitor, for cross-node advisories. The script addmrpt.sql run give report for
single instance, will not report all instances in RAC, this is known as
instance ADDM. But using the new package DBMS_ADDM,
we can generate report for all instances of RAC, this known as database ADDM.
->Optimized
RAC cache fusion protocols - moves on from the general cache fusion protocols
in 10g to deal with specific scenarios where the protocols could be further
optimized.
->Oracle
11g RAC Grid provisioning - The Oracle grid control provisioning pack allows us
to "blow-out" a RAC node without the time-consuming install, using a
pre-installed "footprint".
Oracle 11g
R2 RAC
->We
can store everything on the ASM. We can store OCR & voting files also on
the ASM.
->ASMCA
->Single
Client Access Name (SCAN)
->Clusterware
components: crfmond, crflogd, GIPCD.
->AWR
is consolidated for the database.
->11g
Release 2 Real Application Cluster (RAC) has server pooling technologies so
it’s easier to provision and manage database grids. This update is geared
toward dynamically adjusting servers as corporations manage the ebb and flow
between data requirements for datawarehousing and applications.
->By
default, LOAD_BALANCE is ON.
->GSD
(Global Service Deamon), gsdctl introduced.
->GPnP
profile.
->Cluster
information in an XML profile.
->Oracle
RAC OneNode is a new option that makes it easier to consolidate databases that
aren’t mission critical, but need redundancy.
->raconeinit
- to convert database to RacOneNode.
->raconefix
- to fix RacOneNode database in case of failure.
->racone2rac
- to convert RacOneNode back to RAC.
->Oracle
Restart - the feature of Oracle Grid Infrastructure's ->High Availability
Services (HAS) to manage associated listeners, ASM instances and Oracle
instances.
->Oracle
Omotion - Oracle 11g release2 RAC introduces new feature called Oracle Omotion,
an online migration utility. This Omotion utility will relocate the instance
from one node to another, whenever instance failure happens.
Omotion
utility uses Database Area Network (DAN) to move Oracle instances. Database
Area Network (DAN) technology helps seamless database relocation without losing
transactions.
->Cluster
Time Synchronization Service (CTSS) is a new feature in Oracle 11g R2 RAC,
which is used to synchronize time across the nodes of the cluster. CTSS will be
replacement of NTP protocol.
->Grid
Naming Service (GNS) is a new service introduced in Oracle RAC 11g R2. With
GNS, Oracle Clusterware (CRS) can manage Dynamic Host Configuration Protocol
(DHCP) and DNS services for the dynamic node registration and configuration.
->Cluster
interconnect: Used for data blocks, locks, messages, and SCN numbers.
->Oracle
Local Registry (OLR) - From Oracle 11gR2 "Oracle Local Registry
(OLR)" something new as part of Oracle Clusterware. OLR is node’s local
repository, similar to OCR (but local) and is managed by OHASD. It pertains
data of local node only and is not shared among other nodes.
->Multicasting
is introduced in 11gR2 for private interconnect traffic.
->I/O
fencing prevents updates by failed instances, and detecting failure and
preventing split brain in cluster. When a cluster node fails, the failed node
needs to be fenced off from all the shared disk devices or diskgroups. This
methodology is called I/O Fencing, sometimes called Disk Fencing or failure
fencing.
->Re-bootless
node fencing (restart)? - instead of fast re-booting the node, a graceful
shutdown of the stack is attempted.
->Clusterware
log directories: acfs*
->HAIP
(IC VIP).
->Redundant
interconnects: NIC bonding, HAIP.
->RAC
background processes: DBRM – Database Resource Manager, PING – Response time
agent.
->Virtual
Oracle 11g RAC cluster - Oracle 11g RAC supports virtualization.
What are Oracle Cluster Components?
Cluster Interconnect (HAIP),Shared Storage
(OCR/Voting Disk),Clusterware software.
What are Oracle RAC Components?
VIP, Node apps etc.
What is GNS?
Grid Naming service is alternative service
to DNS , which will act as a sub domain in your DNS but managed by Oracle,with
GNS the connection is routed to the cluster IP and manages internally.
ADD/REMOVE/REPLACE/MOVE OCR Device
Note: You must be logged in as the root user,
because root owns the OCR files. "ocrconfig -replace" command can
only be issued when CRS is running,otherwise "PROT-1: Failed to initialize
ocrconfig" will occur.
Please ensure CRS is running on ALL cluster
nodes during this operation, otherwise the change will not reflect in the CRS
down node, CRS will have problem to startup from this down node. "ocrconfig
-repair" option will be required to fix the ocr.loc file on the CRS down
node.
To
add an OCRMIRROR device
# ocrconfig -add +OCRVOTE2
To
remove an OCR device
# ocrconfig -delete +OCRVOTE1
* Once an OCR device is removed, ocrmirror
device automatically changes to be OCR device.
* It is not allowed to remove OCR device if
only 1 OCR device is defined, the command will return PROT-16.
To replace or move the location of an OCR
device
Note. 1. An ocrmirror must be in place before
trying to replace the OCR device. The ocrconfig will fail with PROT-16, if
there is no ocrmirror exists.
2. If an OCR device is replaced with a
device of a different size, the size of the new device will not be reflected
until the clusterware is restarted.
(at least 2 OCR exist for replace command to
work):
# ocrconfig -replace +CRS -replacement
+OCRVOTE
To
backup OCR
#ocrconfig -showbackup
# ocrconfig -manualbackup
How to restore OCR from automatic backup?
->Stop oracle clusterware on each node AS
ROOT.
crsctl stop crs .
->Identify the most recent backup .
ocrconfig – showbackup .
->Restore the backup file .
ocrconfig – restore
$ORA_CRS_HOME/cdata/crs/backup00.ocr .
->Now verify the restore operation .
cluvfy comp ocr –n all .
->Now restart Oracle clusterware on each
node .
crsctl start crs .
To restore an OCR when clusterware is down
When OCR is not accessible, CRSD process
will not start, hence the clusterware stack will not start completely. A
restore of OCR device access and good OCR content is required.
To view the automatic OCR backup:
# ocrconfig -showbackup
To restore the OCR backup:
# ocrconfig -restore <path/filename of
OCR backup>
To move voting disk on ASM from one DG to
another DG due to redundancy change or
disk location change
->create new diskgroup +CRS as desired
->Shut down clusterware on all the nodes
before making any modification to VD.
crsctl stop crs .
crsctl start crs –excl ( #start in exclusive
mode).
crsctl add css votedisk +ASM_DG32 .
crsctl replace votedisk + ASM_DG32 .
crsctl query css votedisk .
ADD/DELETE/MOVE Voting Disk
For 11.2+, it is no longer required to back
up the VD. The VD data is automatically
backed up in OCR as part of any configuration change. The VD files are backed
up automatically by Oracle Clusterware if the contents of the files have
changed in the following ways:
->Configuration parameters, for example
misscount, have been added or modified
->After performing voting disk add or
delete operations
The voting disk contents are restored from a
backup automatically when a new voting disk is added or replaced.
1. To
add a VD
a. When VD is on cluster file system:
$ crsctl add css votedisk <cluster_fs/filename>
b. When VD is on ASM diskgroup, no
add option available.
The number of votedisk is determined by the
diskgroup redundancy. If more copies of votedisks are desired, one can move
votedisk to a diskgroup with higher
redundancy.
2. To
delete a VD
a. When votedisk is on cluster file system:
$ crsctl delete css votedisk
<cluster_fs/filename>
$ crsctl delete css votedisk
<vdiskGUID>
How to restore ASM based OCR after complete
loss of the CRS diskgroup on Linux/Unix
->Locate the latest automatic OCR backup
$ ls -lrt $CRS_HOME/cdata/rac_cluster1/
->Make sure the Grid Infrastructure is
shutdown on all nodes
# $CRS_HOME/bin/crsctl stop crs -f
->Start the CRS stack in exclusive mode
# $CRS_HOME/bin/crsctl start crs -excl
-nocrs
IMPORTANT:
A new option '-nocrs' has been introduced
with 11.2.0.2, which prevents the start of the ora.crsd resource. It is vital
that otherwise the failure to start the ora.crsd resource will tear down
ora.cluster_interconnect.haip,
->Create the CRS diskgroup
->Restore the latest OCR backup
# cd $CRS_HOME/cdata/rac_cluster1/
# $CRS_HOME/bin/ocrconfig -restore
backup00.ocr
->Recreate the Voting file
# $CRS_HOME/bin/crsctl replace votedisk +CRS
->Recreate the SPFILE for ASM
->Shutdown CRS
Since CRS is running in exclusive mode, it
needs to be shutdown to allow CRS to run on all nodes again.
# $CRS_HOME/bin/crsctl stop crs -f
->Start CRS
As the root user submit the CRS startup on
all cluster nodes:
# $CRS_HOME/bin/crsctl start crs
->Verify CRS
# $CRS_HOME/bin/crsctl check cluster -all
# $CRS_HOME/bin/crsctl status resource –t
LMS processes taking up lot of memory on the
server
GCS
processing
Latency of the GCS can be contributed by CPU
starvation, Memory starvation,IPC latencies,LMS configuration
In case of LMS congestion that is, when LMS
cannot dequeue messages fast enough look at the GCS_SERVER_PROCESSES
In 10g GCS_SERVER_PROCESSES is 2 by default.
In 11g for one CPU, there is one GCS server process. For 2 – 8 CPUs, there will
be 2 GCS server processes. For more than 8 CPUs, the number of GCS server
processes will be equal to the number of CPUs divided by 4. If the result
includes a fraction, ignore the fraction. For example, if you had 10 CPUs, then
10/4 would mean 2 GCS processes.
WHAT IS GPNP PROFILE?
The GPnP profile is a small XML file located
in $GRID_HOME/gpnp/hostname/profiles/peer
under the name profile.xml. Each node maintains a local copy of the GPnP
Profile and is maintanied by the GPnP Deamon
To start clusterware, voting disk needs to
be accessed. If voting disk is on ASM, this information (that voting disk is on
ASM) is read from GPnP profile (<orcl:CSS-Profile id=”css”
DiscoveryString=”+asm” LeaseDuration=”400″/>). The voting disk is read using
kfed utility even if ASM is not
up.Next, the clusterware checks if all the nodes have the updated GPnP profile
and the node joins the cluster based on the GPnP configuration .
The order of searching the ASM SPfile is
1. GPnP profile
2. ORACLE_HOME/dbs/spfilesid.ora
3. ORACLE_HOME/dbs/initsid.ora
What is Cache Fusion?
Transferring of data between RAC instances
by using private network. Cache Fusion is the remote memory mapping of Oracle
buffers,shared between the caches of participating nodes in the cluster.When a
block of data is read from datafile by an instance within the cluster and
another instance is in need of the same block, it is easy to get the block
image from the instance which has the block in its SGA rather than reading from
the disk.
What is global cache service monitoring?
Global Cache Services (GCS) Monitoring
The use of the GCS relative to the number of
buffer cache reads, or logical reads can be estimated by dividing the sum of
GCS requests (global cache gets +
global cache converts + global cache cr
blocks received + global cache current
blocks received ) by the number of logical reads (consistent gets + db block gets ) for a given statistics
collection interval.
A global cache service request is made in
Oracle when a user attempts to access a buffer cache to read or modify a data
block and the block is not in the local cache.
A remote cache read, disk read or change
access privileges is the inevitable result. These are logical read related.
Logical reads form a superset of the global cache service operations.
What is fencing?
I/O fencing prevents updates by failed
instances, and detecting failure and preventing split brain in cluster.
When a cluster node fails, the failed node
needs to be fenced off from all the shared disk devices or diskgroups.
This methodology is called I/O Fencing,
sometimes called Disk Fencing or failure fencing.
FAN (Fast Application Notification)
FAN is a notification mechanism that RAC
uses to notify other processes about configuration and service level
information such as includes service status
changes, such as UP or DOWN events. Applications can respond to FAN events and
take immediate action. FAN UP and DOWN events can apply to instances, services,
and nodes.
FAN also publishes load balancing advisory
(LBA) events. Applications are in a position to take full advantage of the LBA
FAN events to facilitate a smooth transition of connections to healthier nodes
in a cluster.
ONS (Oracle Notification Services)
ONS allows users to send SMS messages,
e-mails, voice notifications, and fax messages in an easy-to-access
manner.Oracle Clusterware uses ONS to send notifications about the state of the
database instances to mid-tier applications that use this information for
load-balancing and for fast failure detection. ONS is a daemon process that communicates
with other ONS daemons on other nodes which inform each other of the current
state of the database components on the database server.
The ONS configuration file is located in the
$ORACLE_HOME/opmn/conf directory
Transparent Application Failover
TAF is transparent application failover
which will move a session to a backup connection if the session fails. With
Oracle 10g Release 2, you can define the TAF policy on the service using
dbms_service package. It will only work with OCI clients. It will only move the
session and if the parameter is set, it will failover the select statement. For
insert, update or delete transactions, the application must be TAF aware and roll
back the transaction. You should enable FCF on your OCI client when you use
TAF, it will make the failover faster.
Note: TAF will not work with JDBC thin
What are various IPs used in RAC? Or How may
IPs we need in RAC?
Public IP, Private IP, Virtual IP, SCAN IP
How many SCAN listeners will be running?
Three SCAN listeners only.
What is FCF?
Fast Connection Failover provides high
availability to FAN integrated clients, such as clients that use JDBC, OCI, or
ODP.NET. If you configure the client to use fast connection failover, then the
client automatically subscribes to FAN events and can react to database UP and
DOWN events.
In response, Oracle gives the client a
connection to an active instance that provides the requested database service.
What are nodeapps?
VIP, listener, ONS, GSD
How to find out the nodes in cluster (or) how
to find out the master node?
#
olsnodes -- Which ever displayed
first, is the master node of the cluster.
select MASTER_NODE from V$GES_RESOURCE;
To find out which is the master node, you
can see ocssd.log file and search for "master node number".
grep crsd log file
# /u1/app/../crsd>grep MASTER crsd.log |
tail -1
(or)
cssd >grep -i "master node" ocssd.log | tail -1
How to know the public IPs, private IPs, VIPs
in RAC?
# olsnodes -n -p -i
node1-pub 1
node1-prv node1-vip
node2-pub 2
node2-prv node2-vip
How to monitor block transfer interconnects
nodes in rac
The v$cache_transfer and v$file_cache_transfer views are used
to examine RAC statistics.
The types of blocks that use the cluster
interconnects in a RAC environment are monitored with the v$ cache transfer
series of views:
v$cache_transfer: This view shows the types and classes of
blocks that Oracle transfers over the cluster interconnect on a per-object
basis.
The forced_reads
and forced_writes
columns can be used to determine the types of objects the RAC instances are
sharing.
Values in the forced_writes column show how
often a certain block type is transferred out of a local buffer cache due to
the current version being requested by another instance.
Upgrade Clusterware ,ADD/RMOVE Node
->Install patchset software only
,installation is out of place.
->Apply PSU patches if required
->run rootupgrade.sh to upgrade
clusterware, it will detect the running clusterwar e and upgrade the
clusterware.
->AddNode.sh to add node
->DeleteNode.sh to remove node
No comments:
Post a Comment