Disclaimer

Friday, 23 July 2021

Background Processes Specific to Oracle RAC

Background Processes Specific to Oracle RAC

An Oracle RAC database has the same processes and memory structures as a single-instance Oracle database and additional process and memory structures that are specific to Oracle RAC. The global cache service and global enqueue service processes, and the global resource directory (GRD) collaborate to enable cache fusion. The Oracle RAC processes and their identifiers are as follows:

ACMSAtomic Control File to Memory Service
GTX[0-j]Global Transaction Process
LMONGlobal Enqueue Service Monitor-Lock Monitor
LMDGlobal Enqueue Service Daemon
LMSGlobal Cache Service Process
LCK0Instance Enqueue Process
LMHBGlobal Cache/Enqueue Service Heartbeat Monitor
PINGInterconnect Latency Measurement Process
RCBGResult Cache Background Process
RMSnOracle RAC Management Processes
RSMNRemote Slave Monitor



above figure illustrates how GCS works with GES to maintain GRD



RAC Background Processes

Each node has its own background processes and memory structures, there are additional processes than the norm to manage the shared resources, theses additional processes maintain cache coherency across the nodes.

Cache coherency is the technique of keeping multiple copies of a buffer consistent between different Oracle instances on different nodes. Global cache management ensures that access to a master copy of a data block in one buffer cache is coordinated with the copy of the block in another buffer cache.

The sequence of a operation would go as below

  1. When instance A needs a block of data to modify, it reads the bock from disk, before reading it must inform the GCS (DLM). GCS keeps track of the lock status of the data block by keeping an exclusive lock on it on behalf of instance A
  2. Now instance B wants to modify that same data block, it to must inform GCS, GCS will then request instance A to release the lock, thus GCS ensures that instance B gets the latest version of the data block (including instance A modifications) and then exclusively locks it on instance B behalf.
  3. At any one point in time, only one instance has the current copy of the block, thus keeping the integrity of the block.

GCS maintains data coherency and coordination by keeping track of all lock status of each block that can be read/written to by any nodes in the RAC. 

GCS is an in memory database that contains information about current locks on blocks and instances waiting to acquire locks. This is known as Parallel Cache Management (PCM)

The Global Resource Manager (GRM) helps to coordinate and communicate the lock requests from Oracle processes between instances in the RAC. 

Each instance has a buffer cache in its SGA, to ensure that each RAC instance obtains the block that it needs to satisfy a query or transaction. 

RAC uses two processes the GCS and GES which maintain records of lock status of each data file and each cached block using a GRD.

So what is a resource, it is an identifiable entity, it basically has a name or a reference, it can be a area in memory, a disk file or an abstract entity. 

A resource can be owned or locked in various states (exclusive or shared). Any shared resource is lockable and if it is not shared no access conflict will occur.

A global resource is a resource that is visible to all the nodes within the cluster. 

Data buffer cache blocks are the most obvious and most heavily global resource, transaction enqueue's and database data structures are other examples. 

GCS handle data buffer cache blocks and GES handle all the non-data block resources.

All caches in the SGA are either global or local, dictionary and buffer caches are global, large and java pool buffer caches are local. 

Cache fusion is used to read the data buffer cache from another instance instead of getting the block from disk, thus cache fusion moves current copies of data blocks between instances (hence why you need a fast private network), GCS manages the block transfers between the instances.

Finally we get to the processes

Oracle RAC Daemons and Processes
LMSn
Lock Manager Server process - GCS
1) This is the cache fusion part and the most active process, 
2) The LMS process maintains records of the data file statuses and each cached block by recording information in the GRD. 
3) The Primary job of LMS process is to transport blocks across the nodes.
4) For example if a node requests consistent-read of a block, The LMS process makes a Consistent-Read image of the block from another node with the help of undo segments and then transports the blocks through the network to the node who requested it.
5) The LMS process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances.
6) It handles the consistent copies of blocks that are transferred between instances. 
7) It receives requests from LMD to perform lock requests. 
8) It rolls back any uncommitted transactions. 
9) There can be up to ten LMS processes running and can be started dynamically if demand requires it.
10) They manage lock manager service requests for GCS resources and send them to a service queue to be handled by the LMSn process. 
11) It also handles global deadlock detection and monitors for lock conversion timeouts.
12) As a performance gain you can increase this process priority to make sure CPU starvation does not occur.
13) You can see the statistics of this daemon by looking at the view X$KJMSDP

LMON
Lock Monitor Process - GES

1) LMON is responsible for monitoring all instances in a cluster for the detection of failed instances.
2) Once a failed Instance is detected it facilitates in the recovery of global locks held by that instance.
3) It is also responsible for reconfiguration of locks and other resources when instances leave or are added to the cluster.
4) This dynamic reconfiguration is done in real-time.
5) This process manages the GES, it maintains consistency of GCS memory structure in case of process death. 
6) It is also responsible for cluster reconfiguration and locks reconfiguration (node joining or leaving), 
7) It checks for instance deaths and listens for local messaging.
8) The LMON monitors the entire cluster to manage the global enqueues and the resources and performs global enqueue recovery operations.
9) LMON manages instance and process failures and the associated recovery for the Global Cache Service (GCS) and Global Enqueue Service (GES).
10) In particular, LMON handles the part of recovery associated with global resources. 
11) LMON provided services are also known as cluster group services (CGS). 
12) Lock monitor manages global locks and resources.
13) It handles the redistribution of instance locks whenever instances are started or shutdown.
14) Lock monitor also recovers instance lock information prior to the instance recovery process.
15) Lock monitor co-ordinates with the Process Monitor (PMON) to recover dead processes that hold instance locks.
16) LMON taking care the registration of database instance with the node monitoring part of the cluster(CSSD).
17) LMON detects the instance transitions and performs reconfiguration of GES and GCS resources.
18) A detailed log file is created that tracks any reconfigurations that have happened.

LMD
Lock Manager Daemon - GES

1) The LMD process basically acts as a broker to LMS process by sending requests for resources to a queue that is handled by the LMS process.
2) These requests are placed by the global cache service in order to keep the block buffers consistent across all the instances.
3) The other responsibility of LMD is of detection and resolution of global deadlocks, along with monitoring of lock timeouts in the global environment.
4) This manages the enqueue manager service requests for the GCS. 
5) It also handles deadlock detention and remote resource requests from other instances.
6) You can see the statistics of this daemon by looking at the view X$KJMDDP
7) The LMD process manages incoming remote resource requests within each instance.
8) The LMD is the lock agent process that manages enqueue manager service requests for Global Cache Service enqueues to control access to global enqueues and resources. 
9) This process manages incoming remote resource requests within each instance.
10) The LMD process also handles deadlock detection and remote enqueue requests. 
11) Remote resource requests are the requests originating from another instance.
12) LMDn processes manage instance locks that are used to share resources between instances.
13) LMDn processes also handle deadlock detection and remote lock requests.

LCK0
Lock Process - GES
1) The LCK process is similar to LMD process, but it handles requests for all global resources excluding requests for database block buffers.
2) Manages instance resource requests and cross-instance call operations for shared resources.
3) It builds a list of invalid lock elements and validates lock elements during recovery.
4) The LCK0 process manages noncache fusion resource requests such as library and row cache requests.
DIAG
Diagnostic Daemon
1) It regularly monitors the health of the instance.
2) Checks for instance hangs and deadlocks.
3) It captures diagnostic data for instance and process failures. 
4) This is a lightweight process, it uses the DIAG framework to monitor the health of the cluster. 
5) It captures information for later diagnosis in the event of failures. 
6) It will perform any necessary recovery if an operational hang is detected.




Instance Enqueue Process (LCK0): The LCK0 process manages noncache fusion resource requests such as library and row cache requests.

Global Cache/Enqueue Service Heartbeat Monitor (LMHB): LMHB monitors LMON, LMD, and LMSn processes to ensure they are running normally without blocking or spinning.

Interconnect Latency Measurement Process (PING): Every few seconds, the process in one instance sends messages to each instance. The message is received by PING on the target instance. The time for the round trip is measured and collected.

Result Cache Background Process (RCBG): This process is used for handling invalidation and other messages generated by server processes attached to other instances in Oracle RAC.

Oracle RAC Management Processes (RMSn): The RMSn processes perform manageability tasks for Oracle RAC. Tasks accomplished by an RMSn process include creation of resources related to Oracle RAC when new instances are added to the clusters.

Remote Slave Monitor (RSMN): The RSMN process manages background slave process creation and communication on remote instances. These background slave processes perform tasks on behalf of a coordinating process running in another instance.

No comments:

Post a Comment

How to recovery PDB when PDB database is dropped in Oracle

  How to recovery PDB when PDB database is dropped :) [oracle@rac01 ~]$ sqlplus '/as sysdba' SQL*Plus: Release 21.0.0.0.0 - Product...