Disclaimer

Sunday, 5 September 2021

Oracle wait event - RAC

Oracle provides visibility and instrumentation to these resource related  matrix via the Oracle wait interface (OWI). 

OWI is driven by the three primary views—GV$SYSTEM_EVENT, GV$SESSION_EVENT, and GV$SESSION_WAIT.

These views provide wait times at the system level or at the individual session level. To isolate waits, a specific set of modules or sessions grouped into services within application wait times could be queried using GV$SERVICE_EVENT.

Wait events are classified into wait classes; this helps in grouping wait events and helps directing tuning efforts to the various areas of the environment. 

The following query (Table 17-2) illustrates the number of wait events grouped under the various wait classes.


SELECT WAIT_CLASS,  COUNT(*) FROM  V$SYSTEM_EVENTGROUP BY WAIT_CLASS; 



Almost all wait events have direct or indirect impact in a RAC environment; however, for our purposes, we discuss few of the wait events that belong to the Cluster class.

Oracle RAC events have some intelligence built into their structure, meaning there is a built-in format to the structure. 

Figure 17-3 illustrates the format of the wait event.




Figure 17-3 illustrates four sections of the wait event. Section 1 is a basic identifier, which shows if the wait event is RAC related. 

RAC related wait events start with an identifier of gc. 

The second section differentiates what kind buffer the wait event is related to: current or consistent read (cr). 

The third section provides an indication on the type of message the wait event will cover: 

if the wait event is block activity related or privilege and access related such as grants. 

The fourth section of the event name indicates/reflects on the type of operation, such as network-related hops (two way and three way), busy or congested. 

As discussed earlier, wait event information can be obtained by querying three primary views: GV$SYSTEM_EVENT, GV$SESSION_EVENT, and GV$SESSION_WAIT. 

The V$SESSION_WAIT event has a few important columns, described in Table 17-3.




The following query will extract the preceding information for the current wait events:

SELECT inst_id, sid, event, wait_class,

       p1,

       p2,

       p3,

       seconds_in_wait

FROM   gv$session_wait WHERE  event LIKE 'gc%'

 

   INST_ID   SID EVENT                    WAIT_CLASS             P1         P2         P3 SWAIT

---------- ----- ------------------------ -------------- ---------- ---------- ---------- -----

         2    37 gc current request       Cluster                 7        306   33554433     0

         2   240 gcs log flush sync       Other                  30          0        138     0

         2   269 gc buffer busy acquire   Cluster                 7        306          1     0

         2   280 gc buffer busy release   Cluster                 7        159          1     0

         1    27 gc buffer busy acquire   Cluster                 7        159          1     0

         1   244 gcs log flush sync       Other                  30          0        138     0

         1   265 gc current request       Cluster                 7        159   33554433     0

 

From the preceding output, using the data in P1 and P2, the related object information could be obtained using the following query:

SELECT segment_name FROM   dba_extents WHERE  file_id = &file   AND  &block BETWEEN block_id AND block_id + blocks - 1    AND ROWNUM = 1;

Enter value for file: 7

Enter value for block: 159

old   3: where file_id = &FILE and &BLOCK between block_id and block_id + blocks - 1 and rownum = 1

new   3: where file_id = 7 and 159 between block_id and block_id + blocks - 1 and rownum = 1

SEGMENT_NAME
------------------------
WAREHOUSE
 

Current and CR (consistent read) waits are based on current and CR blocks in the buffer of various instances. What is the difference between CR and current waits?


Consistent Read vs Current

The very first time a block is read into a buffer of any participating instance, it's termed a current block, no matter what the purpose of the user accessing the block may be: meaning it could be a SELECT operation or a DML operation.

 

The first access is always termed a current operation. Subsequently, a CR block is when the block is transferred from one instance to another instance because a session on that instance requested the block.

When a block is required by a process for read purposes, it accesses the block in shared mode.

When the block is to be modified, the processes would require a grant from the GCS to access this block in exclusive mode.

The frequency of state/grant changes to the block can be obtained by querying STATE column from GV$BH view.

 

The RAC-related state changes are the following:

XCUR—exclusive current

SCUR—shared current

CR—consistent read

READ—reading from disk

WRITE—write clone mode

PI—past image


When blocks are required by more than one process on the same instance, Oracle will clone the block.

The number of times a block can be cloned is defined by the parameter _DB_BLOCK_MAX_CR_DBA and defaults to six, meaning only six cloned copies of the same block of the same DBA (data block address) can exist in the local buffer of an instance.

 The CR blocks are treated like any other data block and use the Touch Count Algorithm (TCA). Under TCA, the block read is placed at midpoint (insertion point) in the buffer cache and will have to gain creditability when sessions access or touch the block to climb up the stack to reach the hot buffer area.

If the block is not touched by other sessions, it will move down the stack and finally get flushed out when new blocks need the buffer.


Similarly, when blocks are required by more than one other instance,

Oracle will ship an image of the CR block  (if it has not already done so) to the requesting instance. 

As we discussed in Chapter 2, blocks are shipped from one instance to another (illustrated in Figure 17-4); and the details regarding which instance contains the block are maintained in the GRD. 

The number of instances a block can exist is determined by the parameter _FAIRNESS_THRESHOLD  parameter and defaults to four, meaning only four images of the 

same block of a particular DBA can exist in a RAC cluster (irrespective of the number of instances) at any given point in time.




Once the holder reaches the threshold defined by the parameter _FAIRNESS_THRESHOLD, it stops making more copies, flushes the redo to the disk, and downgrades the locks


Cluster Waits

In this section, we discuss a few of the common cluster-related wait events.

gc cr/current block two-way/three-way:

One primary function of a cluster environment is to share blocks between instances, minimizing access to the physical storage. 

When a block is required by a session and it's not found in the local cache, it has to be requested either from another instance (current holder) or the disk. 

The time the session has to wait for the block is recorded in one of the transfer wait events depending on where the block was currently being held and how many hops the request passed through to complete the operation. 

For example, a two-way event indicates that there was a two-way shipment to transfer the block and that the requester sends a message to the master and the master ships the block back to the requestor.

Similarly, a three-way event indicates that there were three hops before the block was received by the requestor.

 

In the RAC architecture, to get a block, the GCS may usually have to perform two hops if the block requested cannot be found on the local instance: two hops because if the master is located on the instance where the demand for the blocks concerning a specific object is the highest, and hence most likely, the block is going to be present where the master is located. 

When the demand for the blocks concerning the object increases on another instance, the master will dynamically move/relocate to this new instance. 

Only when blocks are not found on the master does the GCS need to direct the transfer from another instance or from disk (three hops). 

Irrespective of the number of instances in the cluster, the GCS process will have to make a maximum of three hops. That is the reason why the RAC's architecture provides maximum scalability irrespective of the number of instances in the cluster. 

Now, when the three-way wait values are significantly high, it could be an indication that the block was never found on the master or the master did not relocate to another instance where the demand is high.

High wait times for these events would be an indication of poor network transfer speed and bandwidth. 

Very high wait numbers would indicate further investigation should be made into the configuration of the private interconnect configuration and the socket send and receive buffer sizes. 

The run queue length and CPU utilization of the servers should be looked at. 

From the application level, hot spots of the database should be looked into, and the SQL should be tuned to minimize full scans and high block movement across the interconnect by reducing LIO.

 

 

gc cr/current block congested

This wait indicates that the request was made to the DLM but ends up waiting, and the foreground processes have to retry the request. 

Under these circumstances, the “gc cr/current block congested” wait counter is incremented.

Normally this indicates that the GCS process (LMSn background) is not able to keep up with the requests. 

LMSn is a single threaded synchronous process and follows the FIFO algorithm to complete block requests. 

This means that when multiple requests are received, the GES will place the requests in a queue and send the request to the LMSn process when it has completed a previous operation. In such situations, considerations should be given to increase the number of LMSn processes using the parameter GCS_SERVER_PROCESSES

Performance implications of setting a high value.

LMSn processing delays due to frequent interrupts by other high priority processes and LMS being overloaded, making it unable to keep up with requests, could be indications for high wait times. Almost always this is the result of scheduling delays and high queue lengths experienced by the node at the O/S level.

RAC has several other types of wait events. 

For a better understanding, in Figure 17-5 the global cache related wait events have been grouped by the resource area they impact.

 





Figure 17-5. Wait events grouped

 

gc buffer busy

gc buffer busy waits are normally global-level issues of buffer busy waits noticed on single-instance Oracle databases. 

Buffer busy waits are caused by frequent requests for buffer space, and Oracle is unable to acquire such buffer space and goes into wait mode. 

Poorly written SQL statements that perform high LIO are the reasons for high wait times.

In a RAC environment, multiple waiters maybe queuing for the same block that has a global operation pending for both the requestor and holder of the block. 

The wait times could also be caused by high network latencies and slow service times of the servers. 

A user trying to pin a buffer that is globally busy will have to wait until the block has arrived and the waiter at the head of the queue has released the buffer or the remote user has given up the buffer.

Whereas in the earlier releases of Oracle, buffer busy waits in a RAC environment were grouped under this wait event, starting with Oracle database 11g Release 2, the “gc buffer busy” waits are categorized into two new wait categories:

• gc buffer busy acquire

• gc buffer busy release


The classification here is based on if the request was made from the local instance of the requestor, in which case the session will wait on “gc buffer busy acquire.” 

If the request was made from the remote instance, the current instance will wait on “gc buffer busy release” event.

  

gc remaster

This wait indicates the delay encountered when remastering the resource master from one instance to another instance in the cluster. 

RAC architecture allows for dynamic remastering from a less busy instance to an instance where the demand for the object is the highest. 

This movement is called resource affinity.

Apart from remastering based on demand, remastering also happens when an instance leaves (failure) or joins the clustered configuration. 

During instance failure, remastering may happen more than once, first to place the master dynamically on one of the surviving nodes and second, once the cluster has reached a stable state to reassess the situation and remaster is again based on demand. 

 

 

 

wait for master SCN

Each instance in the cluster will generate its own SCN and subsequently, using the propagation method, will resynchronize to the highest SCN in the cluster.

This wait indicates the number of times the foreground processes waited for SCNs to be acknowledged from other instances in the cluster.

Before Oracle database 10g Release 2, the method of SCN propagation was driven by the parameter MAX_COMMIT_PROPAGATION_DELAY. Setting this to a value higher than zero uses the Lamport algorithm. 

Now this parameter is deprecated and is maintained for backward compatibility only and defaults to zero. This functionality is now driven by the underscore (hidden) parameter _IMMEDIATE_COMMIT_PROPAGATION and has a Boolean value of TRUE or FALSE.

When the value of the parameter is set to TRUE (default) Oracle uses the “Block on Commit” (BOC) algorithm for messaging. Although the method of propagation remains similar to the Lamport algorithm, in the case of BOC, the global high water mark for the SCNs sent and received is maintained, thereby reducing messaging traffic for global SCN synchronization and in turn improving overall performance.

 

 

gc cr/current request

This event indicates the time waited when a session is looking for a CR version of a block and cannot find it in its local cache and so has made a request to a remote instance for the block. 

However, the transferred block has not yet arrived at the requesting instance. 

The event ends when the requesting session gets the block or permission to read the block from disk. 

This is normally a placeholder event, and the times here are cumulative of wait times for all other wait events that had an impact on the operations. 

Placeholder wait events are transient (illustrated in Figure 17-5) and are part of every view that captures active wait information.


gc current/cr block busy

This wait indicates that a current or CR block was requested and received but was not sent immediately by LMSn because some special condition that delayed the sending was found. 

Wait times for these events are dominated by block flush times (log file sync) and the defer time for the current blocks. 

Increased wait times are indications of high concurrency and block contention and are caused by high I/O and overall system load.

Querying GV$INSTANCE_CACHE_TRANSFER view would help determine instances involved in high current and CR block activity. 

Reducing log file sync times by monitoring the LGWR I/O and DBWR latencies would also help in determining some of the performance bottlenecks that are causes for high wait times.

At the database level, tuning SQL statements by using sparse blocks, sequence number ranges, or by hash partitioned indexes would help improve wait times.

 

 

gc current/cr grant

This wait indicates that the block access grant was received. However, the block was not present on the local instance of the requestor. 

The wait time only indicates the time to provide access from the remote instance or read the block from disk.

High wait times would indicate network transfer speed and bandwidth, high context switching at the O/S level of the instances involved in the transfer, and overall load on the servers. 

Tuning SQL statements to reduce high buffer activity due to high LIO would help reduce wait times for this event.


gc current grant busy

This wait indicates that a current block was requested and a grant message received. The busy hint implies that the request was blocked because others were ahead of it or it could not be handled immediately. 

This normally occurs when the current block is shared by several instances, and the requestor is waiting for exclusive access to this block.

All holders of the block would have to release this block before exclusive access (S to X conversion) can be granted to the requestor.

High wait times on this event would indicate tuning SQL statements and database objects involved in frequent block activity (hot blocks) should be monitored. Almost always environments where multiple instances are inserting rows (highly insert intensive batch loads) could cause index block splits and could be the primary reason for high wait times for this event.

 

 

gc current/cr multiblock request

Oracle is attempting to read multiple blocks into one message to save on CPU. This read process may include either grants or blocks. These requests are usually caused by a full table scan or index full scan. 

If the request involves “current” blocks, the space management layer would setup multiblock requests when viewing blocks with contiguous data block addresses.

Multiblock requests normally don't cause any additional performance issues unless there are transmission issues over the interconnect and blocks are lost. In that case, there would be performance degradation, followed by gc cr failure and cr request retry activities.

 

 

gc current/cr lost blocks

Data is transferred over the interconnect; however, the data does not reach the target server, but the data was lost during transmission. 

When blocks are lost or corrupt over the interconnect, the GCS process will retransmit the block to the requestor; this would mean the requestor would wait for a longer duration until such time as the block is received.

 

Occasionally, a few blocks being lost (“hiccups”) over the interconnect may not affect the overall performance.

It's when blocks are consistently being lost on a continuous basis that diagnosis should look at getting to the root cause of the issue. 

Blocks lost over the interconnect could be due to one or more of many reasons and should be addressed to avoid poor response times for the users. The following query will help check for lost blocks:

Script: MVRACPDnTap_lostblocks.sql

SELECT A1.inst_id,
       A3.instance_name,
       A3.host_name,
       A3.startup_time,
       ROUND(SYSDATE - startup_time)                         "Days",
       ROUND(( SYSDATE - startup_time ) * 24)                "Hours",
       A1.value                                              "BLKL",
       A1.value / ( ROUND(( SYSDATE - startup_time ) * 24) ) "Perhr",
       A2.value                                              "BLKC"
FROM   gv$sysstat A1,
       gv$sysstat A2,
       gv$instance A3
WHERE  A1.name = 'gc blocks lost'
       AND A2.name = 'gc blocks corrupt'
       AND A1.inst_id = A2.inst_id
       AND A1.inst_id = A3.instance_number
ORDER  BY 1;

 

Ins Instance                         Instance           Uptime  Uptime Blocks Blks Lost  Blocks

 ID Name      Host Name              Startup Time       (Days) (Hours)   Lost  Per Hour Corrupt

--- --------- --------------------- ------------------- ------ ------- ------ --------- -------

  1 SSKYDB_1  ssky4l4p3.localdomain  28-MAR-14 11:56:42      1      35    839        24       0

  2 SSKYDB_2  ssky4l4p2.localdomain  28-MAR-14 12:06:49      1      35     36         1       0

  3 SSKYDB_3  ssky4l4p4.localdomain  28-MAR-14 12:26:43      1      35    101         3       0

  4 SSKYDB_4  ssky4l4p1.localdomain  28-MAR-14 12:15:59      1      35     98         3       0

 

 

 

 

There are several areas that may be the reason for blocks being lost over the interconnect:

• Faulty or poorly configured cables/cards/switches

• Poorly sized UDP receive buffer sizes and/or UDP buffer socket overflows netstat -s reports packet

• Poor interconnect performance and high CPU utilization: reassembly failures

• Network packet corruption resulting from UDP checksum errors and/or send/receive transmission errors

• Mismatched MTU sizes in the communication path

• Interconnect LAN non-dedicated

• Lack of server/switch adjacency

• Outdated network driver or NIC firmware

• Proprietary interconnect link transport and network protocol

• Misconfigured bonding/link aggregation

Misconfigured jumbo frames

• NIC force full duplex and duplex mode mismatch

• Flow control mismatch in the interconnect communication path

• Packet drop at the O/S, NIC, or switch layer

• NIC driver/firmware configuration

• NIC send and receive queue lengths

• High CPU usage and scheduling latencies

 

 

gc current retry:

A current block was requested; however, the requested block was not received due to a failure status either because the block was corrupted or the block was lost during transmission over the interconnect. 

Reasons for the failure to receive the block could be one or many of the reasons listed previously under the “gc current/cr blocks lost” event.


gc cr failure:

This is similar to the gc current retry wait event, but in this case, “cr” block was requested. 

However, a failure status was received either because the block was corrupted or the block was lost during transmission over the interconnect.

Reasons for the failure to receive the block could be one or many of the reasons listed previously under the “gc current/cr blocks lost” event.

 

 

Mutex Waits:

Mutex waits refers to waits resources associated with the management of cursor objects in the shared pool during parsing. Mutex is a type of latch and was introduced with Oracle Database 10g Release 2. 

Similar to latches and other type of waits, these are excessive waits causing contention and performance problems in Oracle database systems.

Mutex contention is normally a CPU-using resource; and high mutex-related contention can cause excessive CPU usage and impact user response times. 

Mutexes are local to the instance in a RAC environment. 

Mutex waits are characterized by sessions waiting for one or more of the following events:

• cursor: mutex X—Cursor is being parsed and is trying to get the cursor mutex in exclusive mode.

• cursor: mutex S—Cursor is parsed and is trying to get the cursor mutex in share mode.

• cursor: pin S—Cursor is being parsed and trying to get the cursor pin in shared mode.

There are currently no concurrent X holders, but the cursor could not acquire the mutex immediately. The wait time indicates the sessions have to increment the mutex reference count and requires performing an exclusive atomic update to the mutex structure.

• cursor: pin X—Cursor is being parsed and trying to get the cursor pin in exclusive mode.

• cursor: pin S wait on X—Cursor is being parsed and has the cursor pin and is trying to get the mutex in shared mode; however, another session is currently holding the same mutex in exclusive mode. The session continues to wait for the current holder to release the mutex

 

 

The following query will help determine the current number of gets and sleeps for the various mutex wait events just described:

 

SELECT to_char(SYSDATE, 'HH:MI:SS') time,
       kglnahsh                     hash,
       SUM(sleeps)                  sleeps,
       location,
       mutex_type,
       substr(kglnaobj, 1, 40)      object
FROM   x$kglob,
       v$mutex_sleep_history
WHERE  kglnahsh = mutex_identifier
GROUP  BY kglnaobj,
          kglnahsh,
          location,
          mutex_type
ORDER  BY sleeps

 

Similar to latches we discussed earlier, when resources are not available, sessions requesting for a mutex also go

into a sleep mode. If a session tries to get a mutex but another session is already holding it, then the get request cannot be granted, and the session requesting the mutex will “spin” for the mutex a number of times in the hope that it will be quickly freed. During each spin, the session will check if the mutex has been freed. If not freed by the end of the spinning, the session waits.

 

 

 







No comments:

Post a Comment

How to recovery PDB when PDB database is dropped in Oracle

  How to recovery PDB when PDB database is dropped :) [oracle@rac01 ~]$ sqlplus '/as sysdba' SQL*Plus: Release 21.0.0.0.0 - Product...