Introduction
AWR is a wonderful tool oracle has provided for database health checkup. It deals with a lot of things necessary to tune the database like TOP 5 wait events, SQL ordered by elapsed time/CPU time/Buffer gets/Physical reads etc. Among other things, the most crucial information in AWR is the amount of CPU used by the system i.e. how much of the CPU time oracle processes took for the elapsed time. When AWR deals with time, it deals with several kinds of time.
ELAPSED TIME: This is the
time between start and end snapshots time of AWR.
Here the elapsed
time is 30 mins.
DB TIME: This is the time Oracle has spent responding
user calls. On a busy system, this time usually greater than ELAPSED TIME. This
time is the sum of total time spent in CPU and total time waiting for various
non-idle events such as I/O, Enqueue etc
DB Time in the
AWR printed above is 8488 mins.
CPU TIME: This is the time Oracle spent on CPU i.e.
using the CPU. This can also exceed ELAPSED TIME as if we have many CPUs then
oracle can use each of them. For 46 CPUs, we have 46s of CPU time per sec
elapsed time.
When it comes to
CPU time, AWR refers to it by three different names:
1) CPU TIME
2) DB CPU
3) CPU USED BY
THIS SESSION
Measuring CPU
If the CPU TIME occurs in the TOP 5 wait event, this does not necessarily indicate a problem. For example:
Here the CPU
TIME is 35938s which is just 7.1 % of DB TIME.
Even if it was high,
this might not indicate a problem.
It could be that
the database is busy in utilizing the CPU.
To know whether CPU TIME is a problem, we need to compare it with the available CPU power we have. If the usage is high in comparison to total CPU power, then it is serious problem and need to be addressed.
The formula
to calculate total CPU power in seconds is:
TOTAL CPU= NUM_CPUS*ELAPSED_TIME*60,
where NUM_CPUs can be found from Operative system statistics section.
If we use this formula in our current scenario, we have:
Total CPU=46 *30*60=82800s.
So, we have 82800 seconds of CPU time for 1800 seconds of DB TIME.
Therefore, % of
CPU used by Oracle processes= 35938/82800*100=43%.
i.e. CPU time/
Total CPU
This means oracle processes used 43% of total CPU power available in the system.
AWR also provides CPU consumption from OS perspective. In the Operating System Statistics, we have BUSY TIME AND IDLE TIME.
We can calculate
the % CPU busy time by the formula:
% BUSY TIME= {BUSY_TIME/ (BUSY_TIME+IDLE_TIME)}*100
= {5788280/8365936}*100
= 69%
This means that
the system was overall 69% was busy utilizing the CPU.
BUSY_TIME can
further be divided into SYS_TIME+USER_TIME.
BUSY_TIME=SYS_TIME+USER_TIME
We can also
calculate % of SYS_TIME and % of USER_TIME.
% SYS_TIME = (SYS_TIME/BUSY_TIME)*100
%USER_TIME = (USER_TIME/BUSY_TIME)*100
Applying the
above formula in our present case, we have:
%SYS_TIME =
(1487507/5788280)*100
= 25.7%
%USER_TIME =
(4300773/5788280)*100
= 74%
As we can see
approx 74% of CPU was accounted for USER_TIME. But this 74% is actually 74% of
%BUSY_TIME i.e. 69%.
Therefore, 74%
of 69% = 51%
This is the
actual USER_TIME.
Similarly, for
SYS_TIME 25.7% of 69%= 17.7%
This is the
actual SYS_TIME.
Now if we see,
CPU_TIME and USER_TIME differ by 51-43=8%.
This means,
apart from oracle processes, 8% of CPU was utilized by non oracle non system
processes. It could also be that 8% of time was spent in run queue i.e. waiting
for CPU. The difference could also be attributed to some bugs in kernel code as
well
Once we find out
High CPU usage is the issue, we need to find out where the maximum CPU time is
getting used. If there is high CPU, then essentially, we will have to find
where the CPU is spent, either in SQL execution or SQL Parsing.
If the database
is doing lots of hard parse it means lack of cursor sharing resulting in
thousands of statements spreading across shared pool which only differ with
literal values.
We can look for
SQL statements consuming High CPU.
We can look in
SQL statements ordered by CPU, SQL statements ordered by Buffer gets.
However, while
examining sql statements our metric value should be per execution basis.
Once the SQL is
identified then it’s easy to tune it.
A couple of caveats
Analyzing CPU usage
from AWR can be tricky as well. For example, if we need to calculate the CPU
usage for a particular time lets say from
Conclusion:-
AWR provides a lot of information regarding CPU utilization. But it can be tricky to get the exact details of time spent in CPU.AWR in coordination with ASH becomes power pact performance analyzer tool that put before us all the artifacts necessary to troubleshoot CPU utilization issues. What differentiates CPU TIME from other wait events is that other wait events can be measured with respect to DB TIME but CPU TIME should be measured w .r .t. the total CPU capacity of the system.
But the question
that may arise is: how much accurate does AWR report CPU usage from OS perspective.
Does the output from vmstat/SAR/mpstat matches with the AWR statistic? Well it
should because Oracle takes CPU from OS itself. My next effort would be to
sample CPU usage from both perspective and see how much of it matches.
No comments:
Post a Comment