Amit's Oracle DBA Blog: AWR Analysis

General Tips before getting an AWR Report.

1. Collect Multiple AWR Reports: It’s always suggested to have two AWR Reports, one for good time (when database was performing well), second when performance is poor. This way we can easily compare good and bad report to find out the culprit.

2. Stick to Particular Time: Always take the report during the time when the Database is performing slow. We have to have a specific time like Database was slow today at 10Am and continued till 2Pm. Here, we can get a report for these four hours.

3. Split Large AWR Report into Smaller Reports: Instead of having one report for long time like one report for 4hrs. it’s is better to have four reports each for one hour. This will help to isolate the problem.

SQL> select * from dba_hist_wr_control;

DBID SNAP_INTERVAL RETENTION TOPNSQL

---------- ---------------------- ------------------------

169052300 +00000 00:30:00.0 +00090 00:00:00.0 50

Before diving into analysis, I want to reiterate few known facts.

The data in an AWR report is the delta, or changes, between the accumulated metrics within each snapshot.
An AWR contains 3 primary sections – (1) Oracle Configuration Details (2) Performance Statistics and (3) Advisory.
Any snapshot over 60 0mins may not be ideal for performance analysis (can be changed though) as it can hide the actual issue.
The default Snapshot retention period is 7 days (can be changed though).

1) Snapshot Details

To begin analysis, I would start by looking at the Snapshot details. You will not find any performance statistics here but you need to make sure that you are looking at the right data. Few checks from the section include –

Start and End snapshot details. If you are comparing with another AWR report, please make sure you are comparing it right.
The elapsed time between the snapshots.
Server details e.g. Memory, number of CPUs, Oracle Database version etc.

Elapsed Time & Database Time

Elapsed time represents simply the time interval between the ‘begin snap id’ and ‘end snap id’.

DB time is equal to CPU time in processing + wait time.

Note:- If the DB time is much greater than elapsed time drill down further in the report to understand the wait events.

In this example, Elapsed Time = 60 mins and DB Time= 292.24 mins and CPU=16
Therefore,

Total CPU time = 60 mins * <number of CPUs>

= 60 mins * 16 CPU = 960 mins.

This means the Database has consumed = 292.24 mins out of 960 mins

i.e. (292.24/960) *100 = 30.44% CPU (average for the entire duration).

Is this bad? Not really, if you have executed a performance test and stressed the system; this is quite expected.
However, if you have executed a load test that was not supposed to stress the system (and probably < 20-30% of the actual load) then yes, it is possibly a problem.

Sessions

Then it is about sessions. What is session? Well, a session represents the connection that stores its persistent objects like transactions, caching etc between an application and the database.

In this report, we can see the number of sessions at the beginning was 198 and then gradually came down to 158.

This means – the database was already busy even before this AWR report start time and it gradually calmed down.

So you will tend to see more activity at the beginning of the session ~15:00 + 15 mins rather at the end 16:00 – 15 mins.

It is probably a good idea to find out the nature of load in the beginning of the snapshot and at the end of the snapshot.

And the database has (=292.24/60.04) = ~5 per active sessions on the average for the entire period.

where 292.24 = DB Time and 60.04 = Elapsed Time

Again this is not bad; however please keep an eye on the number of active sessions.

If you have executed a load test and the tests and it finished within this 60 mins; then I would expect to see the number of active session goes down to a minimum and not remains at 158.

To confirm, please check the number of active session in the next snapshot.

2) Load Profile

Then it is Load Profile. There are two primary reasons for looking at load profile in the beginning.

If you are comparing one AWR report with another, then load profile will help you understand that you have executed exact same/ similar test on your application [and there is no unexpected activity impacted the database performance during that period. Example – database backup job]. I am not saying the number will be exactly same but they should be in an expected range. If that is not the case, then you must find out the changes implemented either in the test you are running or in the system (application or database).
To analyse using the performance statistics.

In the load profile section, you will find all the statistics for the duration between database receiving a query and sending all data back to the application.

DB Time

You can see the DB Time in snapshot details is 292.24 mins = 17534.4 sec. ( Minutes into Seconds)

The DB worked 17534.4 sec within 60 mins i.e. 3600 sec (snapshot interval) without considering any background processing work.

Therefore, it has worked = 17534.4 / 3600 = 4.9 per sec.

In this example, it is definitely not bad.

Parsing

Let’s talk about Parser then. What is parser? It is Oracles way of validating the incoming request and then preparing to serve the request. The general rule of is – hard parsing is bad and soft parsing is good. Although it is quite difficult to put a fix number against each of these types, but anything more than 10% hard parsing is definitely bad. So what is the difference between these two types of parsing?

You provide a SQL to and request Oracle to execute a query and give records back. Oracle first tries to find out if it already has executed similar queries, because if it has then it does not have to load the query to shared memory pool (therefore saves a lot of work) – this is your soft parsing. But if it finds out that it is a completely new query, then it has to first load the query to shared memory pool. This is a hard parsing.

In AWR report, Soft parses are not listed but derived by subtracting the hard parses from parses. A soft parse reuses a previous hard parse; hence it consumes far fewer resources.

But please be careful, (as I mentioned) the opposite is not always true i.e. 99% soft parsing is good performance but good performance does not necessarily mean 99% soft parsing.

REDO log

It is Oracles way to go back to previous state if necessary. Oracle keeps a record of every change in this REDO log until a commit. UPDATE, INSERT type of queries always causes to generate a huge REDO logs while SELECT statement sometimes does it too (may be in a lower scale though).

So what should be the optimum size? Well, that is a difficult one. But a high REDO logs indicates a lot of changes taking place in the database.

In this example, Oracle is generating ~1,230 KB of logs per sec.

This means, if you are expecting a similar load on the system 24 hrs a day,

---> then you are probably talking about 1230*24 *3600 = ~100GB of REDO logs.

This seems high. [Please don’t forget to check your Database size and calculate the relative growth]

Also, there are only 6 transactions took place within the period and each of these transactions has generated about 205KB of REDO logs (redo size)

i.e. (=205077* 6) 1,230,457 bytes per sec. This is definitely high.

Check the database size before running any more tests and compare the size increase. Discuss with developers to find out the types of query executed during this test; if these queries predominantly are SELECT type and not many UPDATE or INSERT then this could be a problem.

User Calls

Oracle will increase the counter whenever the application makes a request. User session Login, soft and hard parsing, physical reads and physical writes – everything is accounted. This means, if you increase the load on the system you should expect to see more Users Calls.

Reads (Logical & Physical)

These two are interesting statistics. So what is read? It is when user (i.e. the application) has requested (user calls) for data, Oracle has validated and transformed (i.e. soft and hard parsing) and then started to extract the data.
As a process, Oracle will try to see if the data is available in Buffer cache i.e. SGA? If it does, then logical read increases to 1.

To explain a bit further, if Oracle gets the data in a block which is consistent with a given point in time, then a counter name “Consistent Gets” increases to 1.

But if the data is found in current mode, that is, the most up-to-date copy of the data in that block, as it is right now or currently then it increases a different counter name “db block Gets”.

Therefore,

a Logical read is calculated as = Total number of “Consistent Gets” + Total number of “db block gets”. These two specific values can be observed in ‘Instance Activity Stats’ section.

But if it Oracle does not find the data in buffer cache, then it reads it from physical block and increases then Physical read count to 1.

Clearly, buffer get is less expensive than physical read because database has to work harder (and more) to get the data. Basically time it would have taken if available in buffer cache + time actually taken to find out from physical block.

In this example, a total of 556,323,564 Consistent Gets and 16,162,346 DB Block Gets took place;

--> Therefore, Logical reads per sec = (556,323,564+ 16,162,346)/3600 sec) = 158,912.5. (Logical read)
Is this good? May be not!

Please remember, lesser physical read compared to Logical read is good performance but too many logical read is definitely not good.

In this case, the distribution is –
(1) Physical Read = =(333.1/159245.6)*100 = 0.2% and
(2) Logical Read = 99.8%.

This shows a well allocated SGA memory and that is definitely good; however the fact that 158K reads in a sec is not good. You need to explore the report to find out the SQL(s) causing this high logical reads.

Transactions (and Rollbacks) & Executes

What is a transaction in Oracle? Transaction in Oracle is simply the total of number of commits and rollbacks.

In this example, the database has rolled back 360 operation in 60 mins (=0.1 * 3600 sec) and committed 21,240 operations (= Total Transactions – Roll backs i.e. = 6*3600 – 360).

Remember, Rollbacks are expensive operations.
So if the question is whether any of these numbers are good or bad?

Well, to answer the question first try to find out about those rollbacks. Speak to DBA and Developers to find out the root cause for those rollbacks. Sometime even a small increase in rollback can impact the database performance significantly.

Now about the number of commits, I would not necessarily say good or bad; rather I will use this number to direct analysis and investigation further.

Please remember, the important aspect is not just the number of transactions, you need to calculate how many queries Oracle has executed (average) per transactions.

This can be calculated as Total execution per sec / number of transactions per sec.

In this example, oracle has executed =311.5/6= ~52 SQLs per transactions. Although there is no absolute boundary to decide low or high; but I would say this is quite high. You need to explore the section detailing about number of SQLs executed most.

In this example, a total of 96.4 parsing in a sec took place and 1 out of them was hard parsing; so 99% was soft parsing.

This is definitely good.

Amit's Oracle DBA Blog

Disclaimer

Saturday, 18 September 2021

AWR Analysis - 1