Disclaimer

Tuesday, 13 July 2021

Oracle RAC Architecture

 

What is the Oracle RAC and Architecture of Real Application Cluster



In order to provide High Availability for Oracle database, Real Application Cluster (RAC) software was introduced in 2001 together with Oracle 9i.

You have the possibility of running the databases as Cluster on multiple nodes against possible server corruption with the Oracle RAC Technology. Thus, a single database can be run with multiple Instances on multiple servers.

The features of Real Application Cluster are as follows.

 

  • High Availability:  Production database provides a service uptime up to 99.99%.
  • Server ( Machine ) Redundancy:  When a database is running on more than one server as cluster, any node or server fails or power off  undesirably,  Database continue to run from the other node or nodes without downtime.
  • Load balancing:  You can create database service and run different services on different nodes in this way.

Our databases are running on a server or machine and these servers can be broken because of Following errors.

  • MotherBoard failures
  • Disk failures
  • Server failures
  • Memory, RAM failures
  • CPU failures
  • Power supply failures

 

If There are faults in the server like above , and Applications have downtime because of database in the Bank, Insurance, Public, Hospital, etc. in critical institutions, then these excuses are not accepted by customers.


To solve or prevent this crisis, we need to set up Oracle Database Cluster, that is Real application cluster.


The overall architecture of Oracle Real Application Cluster is as follows.





The table below describes the difference between a standard oracle database (single instance) and a RAC environment

ComponentSingle Instance EnvironmentRAC Environment
SGAAn instance has its own SGAEach instance has its own SGA
Background processesThe instance has its own set of background processesEach instance has its own set of background processes
DatafilesAccessed by only one instanceShared by all instances (shared storage)
Control FilesAccessed by only one instanceShared by all instances (shared storage)
Online Redo LogfileDedicated for write/read to only one instanceOnly one instance can write but other instances can read during recovery and archiving. If an instance is shut down, log switches by other instances can force the idle instance to redo logs to be archived
Archived Redo LogfileDedicated to the instancePrivate to the instance but other instances will need access to all required archive logs during media recovery
Flash Recovery LogAccessed by only one instanceShared by all instances (shared storage)
Alert Log and Trace FilesDedicated to the instancePrivate to each instance, other instances never read or write to those files.
ORACLE_HOMEMultiple instances on the same server accessing different databases cause the same executable filesSame as single instance plus can be placed on a shared file system allowing a common ORACLE_HOME for all instances in a RAC environment.


Here are the major components of Oracle RAC:

  •    A shared disk system
  •    Oracle Clusterware
  •    Cluster interconnects
  •    Oracle kernel components



There are several components of the Oracle Clusterware Architecture:

  1. Network architecture of public, interconnect, and storage (if applicable);
  2. Disk storage covers the binaries, OCR, OLR, and voting disks
  3. Clusterware stack

Oracle Clusterware Concepts:

Oracle Clusterware is the heart of Oracle RAC Database. Clusterware concepts are very important for a DBA to ride Oracle RAC Databases. The main concepts of Oracle Clusterware are as follows.

Oracle Clusterware mainly has 2 components:-

  1. Voting Disk
  2. Oracle Cluster Registry

Voting Disk:

Voting disk is to determine which nodes are members of a cluster. You can configure them on an ASM or a shared storage

Oracle Cluster Registry:

Oracle Clusterware uses OCR to store, manage information about components that Clusterware controls (eg: RAC Databases, Listsners, SCAN listeners, VIPs, Services etc). OCR stores information in a series of key-value pairs in tree structure.

You can update OCR using supported utilities like OEM (Oracle Enterprise Manager), CRSCTL (Clusterware Control Utility), SRVCTL (Server Control Utility), OCRCONFIG (OCR Configuration Utility), DBCA (DB Configuration Assistant)

Now, lets see what are the Software Clusterware Components,

Clusterware Software components has two stacks:-



Oracle Clusterware consists of two separate stacks: an upper stack anchored by the Cluster Ready Services (CRS) daemon (crsd) and a lower stack anchored by the Oracle High Availability Services daemon (ohasd).

  1. The Cluster Ready Services Stack (Called Upper Stack) – Anchored by CRS Daemon (crsd)
  2. Oracle High Availability Services Stack (Called Lower Stack) – Anchored by Oracle High Availability Services Daemon (ohasd)

1) The Cluster Ready Services Stack consists of:

  • Cluster Ready Services (CRS): The primary program for managing high availability operations in a cluster.
  • The CRS daemon (crsd) manages cluster resources based on the configuration information that is stored in OCR for each resource. This includes start, stop, monitor, and failover operations. The crsd process generates events when the status of a resource changes. When you have Oracle RAC installed, the crsd process monitors the Oracle database instance, listener, and so on, and automatically restarts these components when a failure occurs.

  • Cluster Synchronization Services (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party clusterware, then CSS processes interfaces with your clusterware to manage node membership information.
  • The cssdagent process monitors the cluster and provides I/O fencing. This service formerly was provided by Oracle Process Monitor Daemon (oprocd, also known as OraFenceService on Windows. A cssdagent failure results in Oracle Clusterware restarting the node.

  • Oracle ASM: Provides disk management for Oracle Clusterware.
  • Cluster Time Synchronization Service (CTSS): Provides time management in a cluster for Oracle Clusterware.
  • Event Management (EVM): A background process that publishes events that Oracle Clusterware creates.
  • Oracle Notification Service (ONS): A publish and subscribe service for communicating Fast Application Notification (FAN) events.
  • Oracle Root Agent (orarootagent): A specialized oraagent process that helps crsd manage resources owned by root, such as the network, and the Grid virtual IP address.

2) The Oracle High Availability Services Stack consists of:

  • Grid Plug and Play (GPNPD): GPNPD provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes node have the most recent profile.
  • Grid Interprocess Communication (GIPC): A helper daemon for the communications infrastructure. Currently has no functionality; to be activated in a later release.
  • Multicast Domain Name Service (mDNS): Grid Plug and Play uses the mDNS process to locate profiles in the cluster, as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX, and a service on Windows.
  • Oracle Grid Naming Service (GNS): Handles requests sent by external DNS servers, performing name resolution for names defined by the cluster.


Background Processes in Oracle RAC




The GCS and GES processes, and the GRD collaborate to enable Cache Fusion. The Oracle RAC processes and their identifiers are as follows:

1. ACMS: Atomic Controlfile to Memory Service (ACMS)

In an Oracle RAC environment, the ACMS per-instance process is an agent that contributes to ensuring a distributed SGA memory update is either globally committed on success or globally aborted if a failure occurs.

2. GTX0-j: Global Transaction Process

The GTX0-j process provides transparent support for XA global transactions in an Oracle RAC environment. The database autotunes the number of these processes based on the workload of XA global transactions.

 

3. RMSn: Oracle RAC Management Processes (RMSn)

The RMSn processes perform manageability tasks for Oracle RAC. Tasks accomplished by an RMSn process include creation of resources related to Oracle RAC when new instances are added to the clusters.

4. RSMN: 

Remote Slave Monitor manages background slave process creation and communication on remote instances. These background slave processes perform tasks on behalf of a coordinating process running in another instance.


5. Lock Monitor Processes ( LMON)
  • It Maintains GCS memory structures.
  • Handles the abnormal termination of processes and instances.
  • Reconfiguration of locks & resources when an instance joins or leaves the cluster are handled by LMON ( During reconfiguration LMON generate the trace files)
  • It responsible for executing dynamic lock remastering every 10 mins ( Only in 10g R2 & later versions).
  • LMON Processes manages the global locks & resources.
  • It monitors all instances in cluster, primary for dictionary cache locks,library cache locks & deadlocks on deadlock sensitive on enqueue & resources.
  • LMON also provides cluster group services.
  • Also called Global enqueue service monitor.
6. Lock Monitor Services (LMS)
  • LMS is most very active background processes.
  • Consuming significant amount of CPU time. ( 10g R2 – ensure that LMS process does not encounter the CPU starvation).
  • Its primary job is to transport blocks across the nodes for cache-fusion requests.
  • If there is a consistent-read request, the LMS process rolls back the block, makes a Consistent-Read image of the block and then ship this block across the HSI (High Speed Interconnect) to the process requesting from a remote node.
  • LMS must also check constantly with the LMD background process (or our GES process) to get the lock requests placed by the LMD process.
  • Each node have 2 or more LMS processes.
  • GCS_SERVER_PROCESSES –> no of LMS processes specified in init. ora parameter.
  • Above parameter value set based on number of cpu’s ( MIN(CPU_COUNT/2,2))
  • 10gR2, single CPU instance,only one LMS processes started.
  • Increasing the parameter value,if global cache activity is very high.
  • Also called the GCS (Global Cache Services) processes.
Internal View: X$KJMSDP


7. Lock Monitor Daemon Process ( LMDn)
  • LMD process performs global lock deadlock detection.
  • Also monitors for lock conversion timeouts.
  • Also sometimes referred to as the GES (Global Enqueue Service) daemon since its job is to manage the global enqueue and global resource access.
  • LMD process also handles deadlock detection and remote enqueue requests.
  • Remote resource requests are the requests originating from another instance.

Internal View: X$KJMDDP

8. LCKn ( Lock Process)
  • Manages instance resource requests & cross instance calls for shared resources.
  • During instance recovery,it builds a list of invalid lock elements and validates lock elements.
9. DIAG (Diagnostic Daemon)
  • Oracle 10g – this one new background processes ( New enhanced diagnosability framework).
  • Regularly monitors the health of the instance.
  • Also checks instance hangs & deadlocks.
  • It captures the vital diagnostics data for instance & process failures.


AWM (Automatic Workload Management):-
AWM provide optimal performance for users and applications including providing highest availability for database connections, rapid failure recovery, and balancing workloads optimally across the active configuration. In RAC many features that can enhance automatic workload management
a. connection load balancing,
b. fast connection failover
c. the load balancing advisory
d. runtime connection load balancing.
You can take advantage of automatic workload management by using Oracle Database services in noncluster Oracle databases, especially those that use Oracle Data Guard or Oracle Streams.

Automatic workload management includes the following components:
  • High Availability Framework:  The Oracle RAC high availability framework enables Oracle Database to always maintain components in a running state. Oracle high availability implies that Oracle Clusterware monitors and restarts critical components if they stop, unless you override the restart processing. Oracle Clusterware and Oracle RAC also provide alerts to clients when configurations change, enabling clients to immediately react to the changes, enabling application developers to hide outages and reconfigurations from end users. The scope of Oracle high availability spans from the restarting of stopped Oracle Database processes in an Oracle database instance to failing over the processing of an entire instance to other available instances.
  • Single Client Access Name (SCAN): A single network name and IP addresses defined either in your DNS or GNS that all clients should use to access the Oracle RAC database. With SCAN, you are no longer required to modify your clients when changes occur to the cluster configuration. SCAN also allows clients to use an Easy Connect string to provide load balancing and failover connections to the Oracle RAC database.
Note: SCAN is required regardless of whether you use GNS. If you use GNS, then Oracle automatically creates the SCAN. If you do not use GNS, then you must define the SCAN in DNS.
  • Load Balancing Advisory: This is the ability of the database to provide information to applications about the current service levels being provided by the database and its instances. Applications can take advantage of this information to direct connection requests to the instance that provides the application request with the best service quality to complete the application’s processing. Oracle Database has integrated its Java Database Connectivity (JDBC) and Oracle Data Provider for .NET (ODP.NET) connection pools to work with the load balancing information. Applications can use the integrated connection pools without programmatic changes.
  • Services:  Services enable you to group database workloads and route the work to the optimal instances that are assigned to process the service. Furthermore, you can use services to define the resources that Oracle Database assigns to process workloads and to monitor workload resources. Applications that you assign to services transparently acquire the defined automatic workload management characteristics, including high availability and load balancing rules. Many Oracle Database features are integrated with services, such as Resource Manager, which enables you to restrict the resources that a service can use within an instance. Some database features are also integrated with Oracle Streams, Advanced Queuing (to achieve queue location transparency), and Oracle Scheduler (to map services to specific job classes).In Oracle RAC databases, the service performance rules that you configure control the amount of work that Oracle Database allocates to each available instance for that service. As you extend your database by adding nodes, applications, components of applications, and so on, you can add more services.
  • Server Pools: Server pools enable the CRS Administrator to create a policy which defines how Oracle Clusterware allocates resources. An Oracle RAC policy-managed database runs in a server pool. Oracle Clusterware attempts to keep the required number of servers in the server pool and, therefore, the required number of instances of the Oracle RAC database. A server can be in only one server pool at any time. However, a database can run in multiple server pools. Cluster-managed services run in a server pool where they are defined as either UNIFORM (active on all instances in the server pool) or SINGLETON (active on only one instance in the server pool).
  • Connection Load Balancing: Connection load balancing occurs when the connection is created. Connections for a given service are balanced across all of the running instances that offer the service. You should define how you want connections to be balanced in the service definition. However, you must still configure Oracle Net Services. When you enable the load balancing advisory, the listener uses the load balancing advisory for connection load balancing.

Administrative Tools for RAC :-
Oracle enables you to administer a cluster database as a single system image through Oracle Enterprise Manager, SQL*Plus, or through Oracle RAC command-line interfaces such as Server Control Utility (SRVCTL):
  • Oracle Enterprise Manager: Oracle Enterprise Manager has both the Database Control and Grid Control GUI interfaces for managing both noncluster database and Oracle RAC database environments. Oracle recommends that you use Oracle Enterprise Manager to perform administrative tasks whenever feasible.
  • Server Control Utility (SRVCTL): SRVCTL is a command-line interface that you can use to manage an Oracle RAC database from a single point. You can use SRVCTL to start and stop the database and instances and to delete or move instances and services. You can also use SRVCTL to manage configuration information, Oracle Real Application Clusters One Node (Oracle RAC One Node), Oracle Clusterware, and Oracle ASM.
  • SQL*Plus commands operate on the current instance. The current instance can be either the local default instance on which you initiated your SQL*Plus session, or it can be a remote instance to which you connect with Oracle Net Services.
  • Cluster Verification Utility (CVU): CVU is a command-line tool that you can use to verify a range of cluster and Oracle RAC components, such as shared storage devices, networking configurations, system requirements, and Oracle Clusterware, in addition to operating system groups and users. You can use CVU for preinstallation checks and for postinstallation checks of your cluster environment. CVU is especially useful during preinstallation and during installation of Oracle Clusterware and Oracle RAC components. Oracle Universal Installer runs CVU after installing Oracle Clusterware and Oracle Database to verify your environment. Install and use CVU before you install Oracle RAC to ensure that your configuration meets the minimum Oracle RAC installation requirements. Also, use CVU for verifying the completion of ongoing administrative tasks, such as node addition and node deletion.
  • DBCA: DBCA is the recommended method for creating and initially configuring Oracle RAC, Oracle RAC One Node, and Oracle noncluster databases.
  • NETCA: Configures the network for your Oracle RAC environment.


Clusterware provides the infrastructure necessary to run Oracle Real Application Clusters (Oracle RAC). Oracle Clusterware also manages resources, such as virtual IP (VIP) addresses, databases, listeners, services etc.

Managing Clusterware Services:

You can manage Cluster ware using the below utilities,

  • OEM
  • CVU (Cluster Verification Utility)
  • Srvctl (Server Control)
  • Crsctl (Clusterware Control)
  • Oifcfg (Oracle Interface Configuration Tool) – For network related setup à Oracheck, oradump
  • CHM (Cluster Health Monitor)

 

Managing OCR & Voting Disks:

Few important Commands:

  • Crsctl query crs activeversion
  • Oracheck
  • Crsctl check crs
  • Oraconfig –showbackup
  • Ocrdump –backup <backup file name and location>
  • Olsnodes
  • Crsctl start crs –excl – nocrs (Starting in Exclusive mode without starting other components of CRS)
  • Ocrconfig –restore <backup file with location>
  • Cluvfy comp ocr –n all –verbose
  • Crsctl start has
  • Crsctl stop crs
  • Crsctl stop crs –f

 

Oracle Cluster Registry:

  • Oracheck –local
  • Ocrconfig –local – manual backup
  • Crsctl query css votingdisk
  • Crsctl delete css votingdisk <Path and name>
  • Crsctl add crs votedisk <path>

No comments:

Post a Comment

How to recovery PDB when PDB database is dropped in Oracle

  How to recovery PDB when PDB database is dropped :) [oracle@rac01 ~]$ sqlplus '/as sysdba' SQL*Plus: Release 21.0.0.0.0 - Product...