Disclaimer

Saturday 18 September 2021

AWR Analysis - 2

 General Tips before getting an AWR Report.

1. Collect Multiple AWR Reports: It’s always suggested to have two AWR Reports, one for good time (when database was performing well), second when performance is poor. This way we can easily compare good and bad report to find out the culprit.

2. Stick to Particular Time: Always take the report during the time when the Database is performing slow. We have to have a specific time like Database was slow today at 10Am and continued till 2Pm. Here, we can get a report for these four hours.

3. Split Large AWR Report into Smaller Reports: Instead of having one report for long time like one report for 4hrs. it’s is better to have four reports each for one hour. This will help to isolate the problem.

Now, let us walkthrough the report. Since, AWR report is a huge report and area to look into AWR is also depends on problem to problem. Here, I am listing most common area for a Performance Engineer to look into which will give a clear picture of the issue.

Steps to Analyze AWR Report:

Note: The Report is obtained using OEM with Oracle Release 12.1.0.1.0

  1. Database Details: After getting an AWR Report check the report generated is for the problematic DB. The details are shown in Top part of the report.

In this part cross check for database and instance and and database version with the Database having performance issue. This report shows RAC = NO if it is not a RAC DB.


  1. Host Configuration:T his will give you name, platform CPU, socket and RAM etc. Important thing to notice is that the configuration like CPU and Memory has not changed when the performance is degraded.


  1. Snap Shot Detail: This are the detail about snap shot taken, Snap start time and end time. Difference between them is as “Elapsed“.


DB Time= session time spent in database.

DB Time= CPU Time + Non IDLE wait time.

In Systems with multiple concurrent active sessions, DB time can be larger than elapsed time. 

We can see that DB time is very large as compared to Elapse time, which is not a concern. 

Check if you have taken a report for the time having performance problem. If yes fine, otherwise take a report for performance problem time.

4. Load Profile:

Here are few important stats to look into. First is “DB CPU(s)” per second. Before that let’s understand how DB CPU’s work. Suppose you have 8 cores into the system. So, per wall clock second you have 8 seconds to work on CPU.


So, if “DB CPU(s)” per second in this report > cores in (Host Configuration (#2)) means env is CPU bound and either need more CPU’s or need to further check is this happening all the time or just for a fraction of time.
In this case, machine has 8 cores and DB CPU(s) per second is 0.1. So, this is not a CPU bound case.

Next look at Parses and Hard parses. If the ratio of Hard parses to Parses is high, this means Database is performing more Hard parses. In that case we needs to look at parameters like cursor_sharing and application level for bind variables etc.

In the above we have no issue with parses.

Parse issues usually occur as a result of:

Bad Bind variables usage

Insufficient memory

Will also be co-indicated by low percentage of memory for multiple SQL execution.

  1. Instance Efficiency Percentages:

Note: DB Tuning must never be driven by the Hit ratios. These only provide additional information for help to understand how the instance is operating.

Meanings of particular Hit Ratios:

  1. Buffer Nowait%: Shows the % of times when data buffers were accessed directly without any wait time.
  1. Buffer Hit Ratio : Measures how many times a  required block was found in memory rather than having to execute an expensive read operation on disk to get the block.
  1. Library Hit%: Shows the % of times when SQL statements and PL/SQL packages were found in the shared pool.
  1. Execute to Parse %: Shows how often parsed SQL statements are reused without re-parsing.
  1. Parse CPU to Parse Elapsed %: Gives the ratio of CPU time spent to parse SQL statements.
  1. Redo NoWait %: Shows whether the redo log buffer has sufficient size.
  1. In-memory Sort %: Shows the percentage of times when sorts are performed in memory instead of using temporary tablespaces.
  1. Soft Parse % : Shows how often sessions issued a SQL statements that is already in the shared pool and how it can use an existing version of the statement.
  1. Latch Hit %: Shows how often latches were acquired without having to wait.
  1. % Non-Parse CPU : Shows the percentage of how much CPU resources were spent on the actual SQL execution.

In these statistics, you have to look at

Soft Parse % ~ 100% indicates that the SQL statements are actively re-used.

% Non-Parse CPU“. ~ 100% means most of the CPU resources are used into operations other than parsing, which is good for database health.

Parse CPU to parse Elapsed % : It is very low, it reveals that oracle waits for some resources during parsing of SQL statements. To be investigated further.

Rule of thumb: Always minimize the number of Hard parses. This reduction yields the benefits of minimizing CPU overhead spent performing costly parse work.

  1. Top 10 Foreground Events by Total Wait Time: This is another most important stats to consider while looking at AWR Report for any database performance related issue.  This report is critical because it shows those database events that might constitute the bottleneck for the system. This has a list of top 10 foreground events arranged by Total Wait Time.




Here, first of all check for wait class if wait class is  User I/O , System I/O,  Others etc this could be fine but if wait class has value “Concurrency” then there could be some serious problem. Next to look at is Total Wait Time (sec) which show how many times DB was waiting in this class and then Wait Avg (ms). If Total Wait Time(sec) are high but Wait Avg(ms) is low then you can ignore this. If both are high or Wait Avg(ms) is high then this has to further investigate.

In the above screen shot, most of the resource are taken by Log file sync = 91.9% DB time and the wait class is “Commit”
In this report the event is “log file sync” which has high Waits, huge % DB time and large values in Total Wait Time (sec)  and Wait  Avg(ms) with wait class as commit. So, here you have to investigate further.

Note : Use highest Total Wait Time(sec) to guide investigation.

Next are Wait Classes by Total Wait Time, Host CPU, Instance CPU, IO Profile, Memory Statistics, Cache Sizes are self-explanatory.

Point to remember while checking these results :

  • Watch for number of CPUs
  • Pay attention to changes in Memory size
  • An idle CPU can be a bad thing
  • Always look at Wait IO verses CPU usage
  • If the system is IO bound CPU will be idle!
  1. Shared Pool Statistics:

In general, Memory usage % statistics should be ~70% after the DB has been running a long time. If its quite low, memory is being wasted. Here in our report we have Memory Usage % ~50% which is good.



In case if we have this to ~90% or above as shared pool consumed. This could indicate that the system experiences some overhead while aging out old shared memory structures like cursors, PL/SQL programs, and so on. This will place additional overhead on the CPU to perform reparsing aging-out. The size of the shared pool should be increased appropriately to eliminate such overhead.

The % SQL with executions >1 statistics indicate how many SQL statements are executed more than one time. This measures how well the applications are tuned and how well they make use of Bind variables.

  1. Time Model Statistics: This is a detailed explanations of system resource consumption. Stats are order by Time (s) and % of DB Time.


Some times Sum of all  % of DB time may be > 100%. why is this ?

Because this is cumulative time i.e. SQL execute elapsed time may show more DB time, which includes it sub parts like parse time elapsed, hard parse elapsed time etc. So, if you find Hard parse time elapsed is taking more %. So investigate further so on and so forth.

We have to look for stat which is taking abnormal % of DB time.

9. Operating System Statistics – Detail:

This is the information related to OS, what is the load status on System shown here.



This report shows, system is 97 to 98% idle at time of report taken, So, there is no resource crunch at system level. 
But if, you found very high busy, user or sys % and indeed this will led to low idle %. Investigate what is causing this.

Next, very crucial part of AWR report is SQL Statistics which has all sql query details executed during report time interval.



We will explore few of them, To understand, how to analyzed these reports. Let’s start with

10. SQL Ordered by Elapsed Time:

As explained by name itself, this lists SQL queries ordered by Elapsed time into reported time interval.



In this report, look for query which has low executions and high Elapsed time per Exec (s) and this query could be a candidate for troubleshooting or optimizations.

Important point: some times executions may show 0, it doesn’t mean query is not executing, this might be the case when query was still executing and you took AWR report. That’s why query completion was not covered in Report.

11. SQL Ordered by CPU Time:

In this report, SQL queries are listed on the basis of CPU taken by the query i.e. queries causing high load on the system. The top few queries could be the candidate query for optimization.



From above stat, look for queries using highest CPU Times,

However, there are so many other stats in AWR Report which needs to be considered; here I have listed the most commonly used stats for any performance related information.



No comments:

Post a Comment

100 Oracle DBA Interview Questions and Answers

  Here are 100 tricky interview questions tailored for a Senior Oracle DBA role. These questions span a wide range of topics, including perf...