Amit's Oracle DBA Blog: Cache Fusion

Concept of cache fusion

Cache Fusion basically is about fusing the memory buffer cache of multiple instance into one single cache. For example if we have 3 instance in a RAC which is using the same datafiles and each instance is having its own memory buffer cache in their own SGA, then cache fusion will make the database behave as if it has a single instance and the total buffer cache is the sum of buffer cache of all the 3 instance. Below figure shows what I mean.

This behaviour is possible because of high speed interconnect existing in the cluster between each instance. Each of instance is connected to other instance using a high-speed interconnect. This makes it possible to share the memory between 2 or more servers. Previously only datafile sharing was possible, now because of interconnect, even the cache memory can be shared.

But how this helps? Well, for example if we have a data block in one of the instance and its updating the block and other instance needs the same data block then this data block can be copied from holding instance buffer cache to requesting instance buffer cache using this high-speed interconnect. This high speed interconnect is a private connection made just for sending data blocks and more by instances. External users cannot use this connection. It is this interconnect which makes multiple server behave like a cluster. These servers are bind together using this interconnect.

Moving further, now we know how the cluster is formed and what the back bone of a cluster is and what exactly we call “cache fusion”. Next we will see how cache fusion works. But before that we need to discuss few important headings which are very important to understand.

We will discuss following topics before discussing Cache Fusion

Cache Coherency
Multi-Version consistency model
Resource Co-ordination – Synchronization
Global Cache Service (GCS)
Global Enqueue Service
Global Resource Directory
GCS resource modes and roles
Past Images
Block access modes and buffer states

1) Cache Coherency:
If we consider a single instance database, whenever a user queries for data he gets a consistent view of data. For example another user has already read a block of data and changed some rows in buffer cache. If another user want to read the data from same data block then Oracle will make a copy of that data block in buffer cache and apply the undo information present in undo tablespace to get a consistent view of data. This consistent data is then presented to user who wants to read the data. This is called maintaining consistency of data.
Now consider a multi instance system RAC, where a data block might not be present in same instance. A user might be updating data block in some other instance. If data block are already available in local instance then they will be immediately available to the user. if they are present in some other instance with in the cluster, they will be transferred into local buffer cache.
Maintaining the consistency of data blocks in the buffer cache of multiple instances is called “Cache Coherency”.

2) Multi-Version consistency model:
Multi version consistency model distinguishes between current version of data block and one or mode read consistent version of data block. The current block is the one which contains all the changes, committed as well as uncommitted. Example a user fired a DML on a data block which is not present in any of the instance. Then this block will be read from disk into buffer cache where the value gets changed. After then user commits and fires another DML on same data block. Now that data block is dirty and contains committed as well as uncommitted changes.
Suppose this data block is requested by another user for reading, then oracle will make a copy and apply undo information and make a Consistent Read “CR” copy of this block and ship it to requesting instance. Thus we have multiple versions of same data blocks, each of them are consistent with respect to the user who requested.
During the course of operation there can be many more version of same data block, each of them consistent with respect to some point in time.

3) Resource Co-ordination – Synchronization:
In case of multi instance system such as RAC, where same resources (example data block) are getting used concurrently, effective synchronization is required for maintaining consistency. Within the shared cache, co-ordination of concurrent task is called synchronization. The synchronization provided by Oracle RAC provides a cluster wide concurrency of resource and in turn ensure integrity of shared data. All though there is synchronization within the cache, there is some cost involved for doing the same. If we talk about low level operation of synchronization, it’s just a data copy operation or data transfer operation.
According to Oracle studies, accessing the block in a local cache is much faster than accessing the block from another instance cache with in the cluster. Because with local cache is the in memory copy and with other instance cache, the data transfer needs to be done over high speed interconnect which is obviously slower than in memory copy. Worst is the copy from disk, which is much slower than above two processes.

4) Global Cache Service:
Global Cache Service (GCS) is the main component of Oracle Cache Fusion technology. This is represented by background process LMSn.

There can be max 10 LMS process for an instance. The main function of GCS is to track the status and location of data blocks. Status of data block means the mode and role of data block. GCS is the main mechanism by which cache coherency among “multiple cache” is maintained. GCS is also responsible for block transfer between the instances.
The Global Cache Service background processes (LMSn) manage requests for data access between the nodes of the cluster.
Each block is assigned to a specific instance using the same hash algorithm that is used for global resources. The instance managing the block is known as the resource master. When an instance requires access to a specific block, a request is sent to an LMS process on the resource master requesting access to the block. The LMS process can build a read-consistent image of the block and return it to the requesting instance, or it can forward the request to the instance currently holding the block. The LMS processes coordinate block updates, allowing only one instance at a time to make changes to a block and ensuring that those changes are made to the most recent version of the block.
The LMS process on the resource master is responsible for maintaining a record of the current status of the block, including whether it has been updated.
In Oracle 9.0.1 and Oracle 9.2 there can be up to 10 LMSn background processes (LMS0 to LMS9) per instance; in Oracle 10.1 there can be up to 20 LMSn background processes (LMS0 to LMS9, LMSa to LMSj) per instance; in Oracle 10.2 there can be up to 36 LMSn background processes (LMS0 to LMS9, LMSa to LMSz).
The number of required LMSn processes varies depending on the amount of messaging between the nodes in the cluster.

5) Global Enqueue Service:
Global Enqueue Service (GES) tracks the status of all Oracle enqueuing mechanism. This involves all non-cache fusion intra instance operations. GES performs concurrency control on dictionary cache locks, library cache locks and transactions. It performs this operation for resources that are accessed by more than once instance.
Enqueue services are also present in single instance database. These are responsible for locking the rows on a table using different locking modes.

LMON:
In a single-instance database, access to database resources is controlled using enqueues that ensure that only one session has access to a resource at a time and that other sessions wait on a first in, first out (FIFO) queue until the resource becomes free. In a single-instance database, all locks are local to the instance.
In a RAC database there are global resources, including locks and enqueues that need to be visible to all instances. For example, the database mount lock that is used to control which instances can concurrently mount the database is a global enqueue, as are library cache locks, which are used to signal changes in object definitions that might invalidate objects currently in the library cache.
The Global Enqueue Service Monitor (LMON) background process is responsible for managing global enqueues and resources. It also manages the Global Enqueue Service Daemon (LMD) processes and their associated memory areas. LMON is similar to PMON in that it also manages instance and process expirations and performs recovery processing on global enqueues.
In Oracle 10.1 and below there is only one lock monitor background process.

LMDn:
The current status of each global enqueue is maintained in a memory structure in the SGA of one of the instances.
For each global resource, three lists of locks are held, indicating which instances are granted, converting, and waiting for the lock.
The LMD background process is responsible for managing requests for global enqueues and updating the status of the enqueues as requests are granted.
Each global resource is assigned to a specific instance using a hash algorithm. When an instance requests a lock, the LMD process of the local instance sends a request to the LMD process of the remote instance managing the resource. If the resource is available, then the remote LMD process updates the enqueue status and notifies the local LMD process.
If the enqueue is currently in use by another instance, the remote LMD process will queue the request until the resource becomes available. It will then update the enqueue status and inform the local LMD process that the lock is available.
The LMD processes also detect and resolve deadlocks that may occur if two or more instances attempt to access the two or more enqueues concurrently.
In Oracle 10.1 and below there is only one lock monitor daemon background process named LMD0.

LCK0:
The instance enqueue background process (LCK0) is part of GES. It manages requests for resources other than data blocks—for example, library and row cache objects. LCK processes handle all resource transfers not requiring Cache Fusion. It also handles cross-instance call operations. In Oracle 9.0.1 there could be up to ten LCK processes (LCK0 to LCK9). In Oracle 9.2 and Oracle 10.1 and 10.2 there is only one LCK process (LCK0).

DIAG:
The DIAG background process captures diagnostic information when either a process or the entire instance fails. This information is written to a subdirectory within the directory specified by the BACKGROUND_DUMP_DEST initialization parameter. The files generated by this process can be forwarded to Oracle Support for further analysis.
There is one DIAG background process per instance. It should not be disabled or removed. In the event that the DIAG background process itself fails, it can be automatically restarted by other background processes.

6) Global Resource Directory:
GES and GCS together maintain Global Resource Directory (GRD). GRD is like an in-memory database which contains details about all the blocks that are present in cache. GRD know what is the location of latest version of block, what is the mode of block, what is the role of block etc. Whenever a user ask for any data block, GCS gets all the information from GRD. GRD is a distributed resource, meaning that each instance maintain some part of GRD. This distributed nature of GRD is a key to fault tolerance of RAC. GRD is stored in SGA. Typically GRD contains following and more information

a) Data Block Address – This is the address of data block being modified.
b) Location of most current version of data block.
c) Modes of data block.
d) Roles of data block.
e) SCN number of data block.
f) Image of data block – Could be current image or past image.

7) GCS resource modes and roles:

Mode of data block is decided based on whether a resource holder intends to modify the data or read the data. The modes are as follows:

a) Null (N) Mode: Null mode is the least restrictive mode. It indicates no access rights. It acts as a place holder.
b) Shared (S) Mode: Shared mode indicate that database block is being read and not modified. However another session can read the data block.
c) Exclusive (X) Mode: Exclusive mode indicate exclusive access to block. Other resource cannot have write over this data block. However it can have consistent read on this datablock.

GCS resources also have roles. Following are the different roles present:

a) Local: When a data block is first read into the instance from the disk it has a local role. Meaning that only 1 copy of data block exists in the cache. No other instance cache has a copy of this block.
b) Global: Global role indicates that multiple copy of data block exists in clustered instance. For example a user connected to one of the instance request for a data block. This data block is read from disk into an instance. The role granted is local. If another instance request for same block this block will get copied to the requesting instance and the role becomes global.

This role and mode information is maintained in GRD (Global Resource Directory) by GCS (Global Cache Service).

8) Past Images:
Past Image concept was introduced in Oracle 9i to maintain data integrity. In an Oracle database, a typical block is not written to disk immediately after it is dirtied. This is to reduce excessive IO. When the same dirty block is requested by some other instance for write of read purpose, an image of the block is created in owning instance and then the block is shifted to requesting instance. This image copy of the block is called Past Image (PI). In the event of failure Oracle can reconstruct the block by reading PIs. It is also possible to have more than 1 PI of the block, depending on how many times the block was requested in dirty stage.

A past image of the block is different to CR (Consistent read) image. Past image is required to create CR by applying undo data.

9) Block access modes and buffer states:
An additional concurrency control concept is the buffer state which is the state of a buffer in the local cache of an instance. The buffer state of a block relates to the access mode of the block. For example, if a buffer state is exclusive current (XCUR), an instance owns the resource in exclusive mode.
To see a buffer’s state, query the “status” column of the V$BH dynamic performance view. The V$BH view provides information about the block access mode and their buffer state names as follows:
 With a block access mode of NULL the buffer state name is CR — An instance can perform a consistent read of the block. That is, if the instance holds an older version of the data.
 With a block access mode of S the buffer state name is SCUR — An instance has shared access to the block and can only perform reads.
 With a block access mode of X the buffer state name is XCUR –An instance has exclusive access to the block and can modify it.
 With a block access mode of NULL the buffer state name is PI — An instance has made changes to the block but retains copies of it as past images to record its state before changes.

Only the SCUR and PI buffer states are Real Application Clusters-specific. There can be only one copy of any one block buffered in the XCUR state in the cluster database at any time. To perform modifications on a block, a process must assign an XCUR buffer state to the buffer containing the data block.
For example, if another instance requests read access to the most current version of the same block, then Oracle changes the access mode from exclusive to shared, sends a current read version of the block to the requesting instance, and keeps a PI buffer if the buffer contained a dirty block.

At this point, the first instance has the current block and the requesting instance also has the current block in shared mode. Therefore, the role of the resource becomes global. There can be multiple shared current (SCUR) versions of this block cached throughout the cluster database at any time.

Block transfer using Cache Fusion:
Let’s consider a very detailed example of how the block transfer happens between different instances.
For explaining this example, I am assuming a 3 node RAC system and also another assumption is that any DML statement is followed by a commit. So if I say that a user executed update that means user executed update + commit. But there is no checkpoint until the end.

Stage 1) In stage 1 datablock is requested by a user C who is connected to instance 3. So a datablock is read into the buffer cache of instance 3.
select sales_rank from salesman where salesid = 10;

Assume this gives a value of 30. This block is read for the first time and it’s not present in any other instance. So the role of block is LOCAL and the block is read in SHARED mode. Also there are NO PAST IMAGES. So we describe this stage has instance 3 having SL0 mode (SHARED, LOCAL, 0 PAST IMAGES).

Stage 2) In stage 2 user B issues the same select statement against the salesman table. Instance 2 will need the same block; therefore, the block is shipped from instance 3 to instance 2 via cache fusion interconnect. There is no disk read at this time. Both instances are in SHARED mode (S) and role is LOCAL (L). Here if you see carefully that even though the block is present in more than once instance, still we say that role is local because the block is not yet dirtied. Had the block been dirty and then requested by other instance, then in that case the role will change to global.

Stage 3) In stage 3 user B decides to update the row and commit at instance 2. The new sales rank is 24. At this stage, instance 2 acquires EXCLUSIVE lock for updating the data at instance 2 and SHARED lock from instance 3 is downgraded to NULL lock.
update salesman set sales_rank = 24 where salesid = 10; commit;

So instance 2 is having a mode XL0 (Exclusive, Local with 0 past images) and instance 3 is having a NULL lock, which is just a place holder. Also the role of the block is still LOCAL because the block is dirtied for the first time only on instance 2 and no other instance is having any dirty copy of that. If another instance now tries to update same block the role will change to global.

Stage 4) In stage 4 user A decides to update in instance 1 the same row and hence the same block with salesrank of 40. It finds that block is dirtied in instance 2. Therefore the datablock is shipped to instance 1 from instance 2, however, a PAST IMAGE of the datablock is created on instance 2 and lock mode on instance 2 is downgraded to NULL with a GLOBAL role. Instance 2 now has NG1 (NULL lock with GLOBAL role and 1 PAST IMAGE). At this time instance 1 will have EXCLUSIVE lock with GLOBAL role (XG0).

Stage 5) User C executes a select statement from instance 3 on same row. The datablock from instance 1 being the most recent copy (GRD (Global Resource Directory) knows this information about which instance is having the latest copy of datablock), it is shipped to instance 3. As a result the lock on instance 1 is converted to SHARED GLOBAL with 1 PAST IMAGE. The reason the lock gets changed to SHARED and not NULL is because instance 3 asked for shared lock (for reading data) and not exclusive lock (for updating data). If the instance 3 asked for exclusive lock then the instance 1 would have had NULL lock.

Also the instance 3 will now hold SG0 (SHARED, GLOBAL with 0 PAST IMAGES).

Stage 6) User B issues the same select statement against the salesman table on instance 2. Instance 2 will request for a consistent copy of buffer from another instance, which happens to be the current master.
Therefore instance 1 will ship the block to instance 2, where it will be required with SG1 (SHARED, GLOBAL with 1 PAST IMAGE). So instance 2 mode becomes SG1.

Stage 7) User C on instance C updates the same row. Therefore the instance 3 requires an exclusive lock and instance 1 and instance 2 will be downgraded to NULL lock with GLOBAL role and 1 PAST IMAGE. Instance 3 will have EXCLUSIVE lock, GLOBAL role and with no PAST IMAGES (XG0).

Stage 8) The checkpoint is initiated and a “Write to Disk” takes place at instance 3. As a result previous past images will be discarded (as they are not required for recovery) and instance 3 will hold that block in EXCLUSIVE lock LOCAL role with no PAST IMAGES (XL0).

Further if any instance wants to read or write on the same block then a copy will be again shifted from instance 3.

Amit's Oracle DBA Blog

Disclaimer

Tuesday, 13 July 2021

Cache Fusion - RAC Oracle

No comments:

Post a Comment

Oracle Exadata

Labels