Disclaimer

Thursday 14 October 2021

AWR reports: Measuring CPU

 

Introduction

AWR is a wonderful tool oracle has provided for database health checkup. It deals with a lot of things necessary to tune the database like TOP 5 wait events, SQL ordered by elapsed time/CPU time/Buffer gets/Physical reads etc. Among other things, the most crucial information in AWR is the amount of CPU used by the system i.e. how much of the CPU time oracle processes took for the elapsed time. When AWR deals with time, it deals with several kinds of time.

 

ELAPSED TIME: This is the time between start and end snapshots time of AWR.


Here the elapsed time is 30 mins.

 

DB TIME:  This is the time Oracle has spent responding user calls. On a busy system, this time usually greater than ELAPSED TIME. This time is the sum of total time spent in CPU and total time waiting for various non-idle events such as I/O, Enqueue etc

DB Time in the AWR printed above is 8488 mins.

 

CPU TIME:  This is the time Oracle spent on CPU i.e. using the CPU. This can also exceed ELAPSED TIME as if we have many CPUs then oracle can use each of them. For 46 CPUs, we have 46s of CPU time per sec elapsed time.

 

When it comes to CPU time, AWR refers to it by three different names:

1) CPU TIME

2) DB CPU

3) CPU USED BY THIS SESSION

 

Measuring CPU

If the CPU TIME occurs in the TOP 5 wait event, this does not necessarily indicate a problem. For example:



Here the CPU TIME is 35938s which is just 7.1 % of DB TIME.

Even if it was high, this might not indicate a problem.

It could be that the database is busy in utilizing the CPU.

 To know whether CPU TIME is a problem, we need to compare it with the available CPU power we have. If the usage is high in comparison to total CPU power, then it is serious problem and need to be addressed.

 

The formula to calculate total CPU power in seconds is:

 TOTAL CPU= NUM_CPUS*ELAPSED_TIME*60,

 where NUM_CPUs can be found from Operative system statistics section.



If we use this formula in our current scenario, we have:


Total CPU=46 *30*60=82800s.

So, we have 82800 seconds of CPU time for 1800 seconds of DB TIME.

Therefore, % of CPU used by Oracle processes= 35938/82800*100=43%.

 

i.e. CPU time/ Total CPU

This means oracle processes used 43% of total CPU power available in the system.

AWR also provides CPU consumption from OS perspective. In the Operating System Statistics, we have BUSY TIME AND IDLE TIME.




We can calculate the % CPU busy time by the formula:

 

% BUSY TIME= {BUSY_TIME/ (BUSY_TIME+IDLE_TIME)}*100

                           = {5788280/8365936}*100

                            = 69%

 

This means that the system was overall 69% was busy utilizing the CPU.

BUSY_TIME can further be divided into SYS_TIME+USER_TIME.

 

BUSY_TIME=SYS_TIME+USER_TIME


We can also calculate % of SYS_TIME and % of USER_TIME.

 

% SYS_TIME = (SYS_TIME/BUSY_TIME)*100

 

%USER_TIME = (USER_TIME/BUSY_TIME)*100

 

Applying the above formula in our present case, we have:

 

%SYS_TIME = (1487507/5788280)*100

                        = 25.7%

 

%USER_TIME = (4300773/5788280)*100

                          = 74%

   

As we can see approx 74% of CPU was accounted for USER_TIME. But this 74% is actually 74% of %BUSY_TIME i.e. 69%.

 

Therefore, 74% of 69% = 51%

This is the actual USER_TIME.

 

Similarly, for SYS_TIME 25.7% of 69%= 17.7%

This is the actual SYS_TIME.    

         

Now if we see, CPU_TIME and USER_TIME differ by 51-43=8%.

 

This means, apart from oracle processes, 8% of CPU was utilized by non oracle non system processes. It could also be that 8% of time was spent in run queue i.e. waiting for CPU. The difference could also be attributed to some bugs in kernel code as well

 

Once we find out High CPU usage is the issue, we need to find out where the maximum CPU time is getting used. If there is high CPU, then essentially, we will have to find where the CPU is spent, either in SQL execution or SQL Parsing.

If the database is doing lots of hard parse it means lack of cursor sharing resulting in thousands of statements spreading across shared pool which only differ with literal values.

We can look for SQL statements consuming High CPU.

We can look in SQL statements ordered by CPU, SQL statements ordered by Buffer gets.

However, while examining sql statements our metric value should be per execution basis.

Once the SQL is identified then it’s easy to tune it.

 

 A couple of caveats

 Analyzing CPU usage from AWR can be tricky as well. For example, if we need to calculate the CPU usage for a particular time lets say from 7:15 to 7:30, then to gather CPU information from a 60 minute report is useless. Fortunately, in 10g and above, oracle provides us with another amazing utility, the ASH which stands for active session history. By using ASH, we can go to any time period we want and see how much of the CPU was consumed for that time period.

 

Conclusion:-

AWR provides a lot of information regarding CPU utilization. But it can be tricky to get the exact details of time spent in CPU.AWR in coordination with ASH becomes power pact performance analyzer tool that put before us all the artifacts necessary to troubleshoot CPU utilization issues. What differentiates CPU TIME from other wait events is that other wait events can be measured with respect to DB TIME but CPU TIME should be measured w .r .t.  the total CPU capacity of the system.

But the question that may arise is: how much accurate does AWR report CPU usage from OS perspective. Does the output from vmstat/SAR/mpstat matches with the AWR statistic? Well it should because Oracle takes CPU from OS itself. My next effort would be to sample CPU usage from both perspective and see how much of it matches.



No comments:

Post a Comment

100 Oracle DBA Interview Questions and Answers

  Here are 100 tricky interview questions tailored for a Senior Oracle DBA role. These questions span a wide range of topics, including perf...