11. The Instance Activity Stats are next, pages of them. Only a few of the Instance Activity Stats need to be reviewed in a quick
analysis. Now for a detailed analysis for the root cause of particular problems, you may want to go back and look at specific
numbers, but during a quick analysis you are interested in “low hanging fruit” only. Here is a list of what I look at:
a. SQL*Net Round Trips from Client: Too many per transaction indicate that array passing for results is not being used
efficiently.
b. SQL*Net Round Trips from DBLink: See above. Also look at pushing the work to the remote host and only getting back results using hints.
c. Index Fast Full Scans (full): Use this with Table Scan statistics to see where the majority of db file scattered reads are occurring.
d. Index fetch by key: Use to see efficiency of indexing.
e. Parse count (hard) and Parse count (total): Use to verify bind variable and versioning issues.
f. Physical read total io requests + Physical write total io requests: Use to determine physical IOPS by calculating a ratio of block IOPS to physical IOPS. You can see if the system is weighted towards full table scan or single block look up. Use session count and physical IOPS to help determine if the disk farm can handle the load based on IOPS and concurrency needs.
g. Sorts (disk): If present and excessive may indicate insufficient PGA_AGGREGATE_TARGET or SORT_AREA_SIZE settings.
h. SQL Area evicted: May indicate excessive object editing.
i. SQL area purged: May indicate too small a shared pool or bind/versioning issues.
j. Summed dirty queue length or write requests: If excessive indicates you need more DB_WRITER_PROCESSES. This is platform dependent.
k. Table fetch by rowid and Table scan rows gotten: In an online transaction performance (OLTP) system rowid fetches should be greater than scan rows, in a data warehouse (DWH) or OLAP scan rows should be greater than rowid fetches. If scan rows is way too high then look at lack of indexing issues.
l. Table fetch continued rows: May indicate chained rows, however, is also affected by LOBs. m. Transaction rollbacks: Used to see if rollbacks are excessive. This ties in with excessive UNDO tablespace IO.
n. User IO wait time: If the IOwait is not populated on your system, this can provide needed data.
o. Workarea executions one pass or multi-pass: Indicates PGA_AGGREGATE_TARGET is not set correctly.
12. Instance Activity Stats (Absolute Values): These are interesting to look at but not really used in a quick look. May help with
PGA_AGGREGATE_TARGET sizing.
13. Instance Activity Stats (Thread Activity): This is useful to see redo log activity. The old rule of thumb is four switches per hour
which is about right. I have not seen a need (except in Dataguard maybe) to have this more frequent. Less frequent, as long as it is
not ridiculous, is ok. If logs are switching too often, increase the physical redo log size.
14. In Oracle11g, version 11.2, in the AWR, they have added an IO Stats section that breaks down all the IO in the database into
various areas. This is a good place to see where your IOs are generated and to get IOPS numbers for read and write activity.
15. Tablespace and File IO statistics: These are useful to see what your hot tablespaces are. For example, having the SYSTEM
tablespace as the number one source of IO could indicate you have improper temporary tablespace assignments as these used to
default to SYSTEM. Having the TEMP or UNDO tablespaces in the top position has already been discussed. Usually in an OLTP
system one of your index tablespaces should be at the top. In a DWH or OLAP a data tablespace should be at the top. Also look at
the latency values. For disk based systems 5.0 ms is considered good performance. If you are getting >10 ms then you are having
queuing issues and need to review your IO subsystem. If your database release is <10 then you can get IOPS by summing the reads
and writes per second from this section.
16. Immediately after the tablespace statistics is a small area (usually one line-worth unless you are using KEEP, RECYCLE or multiple
block size pools), called Buffer Pool Statistics. Look at buffer busy waits. If these are high, you are either IO bound or have
insufficient buffers in one or more of your buffer pools.
17. Instance recovery stats I usually do not look at as they are not performance related. Next I look at the Buffer Pool Advisory
section. This section can be confusing, because if your buffer pool is too small, it may indicate it is ok. If you are seeing high buffer
busy waits and lots of cache buffer latch issues, but the advisor says everything is fine, do not believe it! Most of the time I just look
at the very end of the section to see what the effect of doubling the cache size would be. If it is significant (>20 percent reduction in
physical IO) I suggest increasing the pool. However, even if it is not significant but I see lots of buffer busy waits and cache buffer
latch issues, I will suggest gradually increasing the pool in 20 percent increments to see if buffer busy waits decrease.
18. In the PGA Aggr sections I look to see if a significant hit rate is being seen, as this would indicate that the setting for the PGA_AGGREGATE_TARGET is too small. I also look at the histogram section to see where the majority of single or multi-pass sorting is happening. If I am getting single or multi-pass sorts I take the higher of the histogram range settings and multiply it by 40 and then
times one-tenth the number of reported sessions. This should be a reasonable estimate of the needed PGA_AGGREGATE_TARGET setting. If the setting is already higher than this number, then your users are doing more sorts than average and the setting should be increased in 20 percent intervals until sorting to disk is under control. Generally speaking the advisory section in the PGA area is useless.
Note: If DISPATCHERS or SHARED_SERVERS are set, then some parallel processes will not use PGA_AGGREGATE_TARGET but will instead default to the old SORT_AREA_SIZE parameter, if you see loads of sorts under 512 megabytes, this could be the cause, either turn of DISPATCHERS or SHARED_SERVERS or set SHARED_POOL_SIZE.
19. I also find the shared pool advisory generally useless as it always seems to indicate the current setting is just great. Look at the SQL purges and the use/reuse ratios to see if the shared pool is being properly used. If you consistently run at 90 percent or greater usage then you may need to increase the shared pool size.
20. The SGA Target and Streams pool advisories are also generally useless. Unless you are really hitting your steams pool hard, the default size is good. If you start seeing disk spills, increase the size. I have also never seen the Java Pool advisory section give any useful information.
21. The Buffer Wait Statistics help give you an indication of where the buffer waits you noted at the beginning of the Buffer Statistics section are coming from. In the majority of cases these are going to be data block waits, which indicate either your IO subsystem is too slow or you have insufficient buffers, or both!
22. In the Enqueue Waits area, look at the top few enqueues. Usually they will be TX type enqueues, indicating issues with the application such as row locking or non-indexed foreign keys. In RAC you may see SQ enqueues if sequence caching is not set properly.
23. Most times I do not see anything in the UNDO statistics that warrants attention. However, in systems where you see values in the uS/uR/uU/ eS/eR/eU column other than zeroes, you may want to dig deeper into how they are using commits and rollbacks and look at the sizing of the undo tablespaces and setting of the “_transactions_per_rollback_segment” parameter to tune usage. Also parameters dealing with undo retention such as UNDO_RETENTION should be verified.
24. I use the latch sections to confirm if there are buffer or shared pool issues. On a quick run through, you are not going to do
detailed latch analysis unless you get some sort of odd wait event that shows up in the top five and requires more detailed analysis.
Latch free and cache buffer latches will probably be the dominant latches. Pay attention to latches that have excessive sleeps or
anything in the sleep1, 2 or 3 columns.
25. The Segment Statistics is a very important part of the report. Review segments that have excessive reads, excessive full scans and
any excessive buffer busy waits. In a RAC environment the ones that have excessive CR or current waits might benefit from a smaller block size tablespace. Excessive reads could indicate insufficient indexes.
26. The dictionary cache area I find rather useless since you cannot do anything about it other than increase the shared pool size.
The Library Cache Activity area can be used to see SQL Area evictions and purges to determine if the pool is large enough.
27. The process areas I find useless most of the time. The SGA breakdown difference statistics can be helpful to determine if your
settings for SGA_MAX_SIZE and SGA_TARGET are sufficient or if you are on 11g, if the settings for MEMORY_MAX_SIZE and
MEMORY_TARGET are sufficient. Also look at the large pool free, shared pool free, streams pool free and java pool free settings to
see if you need to increase these parameters. Look for lots of deferred actions. If your report shows the Resize actions and you see
them, then those parameters probably are incorrectly sized. Look at the v$sga_resize_ops table to see what is happening as far
as resize events in the SGA. Here is an SQL report to get an idea of the activity from v$sga_resize ops:
set lines 200
set pages 55
column inst_id format 999 heading ‘Inst’
column component format a24 heading ‘Component’
column oper_type heading ‘Oper Type’ format a12
column status format a9
column s_time format a12
column e_time format a12
column parameter format a21
column now new_value dt noprint
select to_char(sysdate,’yyyymmddhh24mi’) now from dual;
ttitle ‘Resize OPS’
spool resize_ops&&dt
select inst_id,component,oper_type,parameter,initial_size, target_size, final_size,status,
to_char(start_time,’yyyymmddhh24mi’) s_time,to_char(end_time,’yyyymmddhh24mi’) e_time
from gv$sga_resize_ops
order by 1,2,9
/
spool off
set lines 80 pages 20
clear columns
ttitle off
28. The streams, resource and remainder of the statistics sections usually are not useful unless you are looking for a particular issue with those areas.
29. The final area will be the init.ora parameters report. I usually look at the various pool and buffer sizes (if they are set). One thing I have been seeing more and more is the improper setting of the SGA_MAX_SIZE and SGA_TARGET or MEMORY_MAX_SIZE and MEMORY_TARGET settings on Oracle10g and Oracle11g. There are two settings for a reason; they are not supposed to be set to the
same value. The TARGET parameters should be calculated based on a set of base parameters: DB_CACHE_SIZE, SHARED_POOL_SIZE, JAVA_POOL_SIZE, STREAMS_POOL_SIZE, PGA_AGGREGATE_SIZE. Set these to a setting that gives you minimal acceptable performance. Once either the SGA_TARGET (Oracle10g) or MEMORY_TARGET (in Oracle11g) calculates, set the MAX_SIZE parameter to 25 percent greater than the setting of the TARGET parameter. This gives adjustment room. If the two parameters are set equal to each other, then Oracle has to steal from one parameter when one of the others needs memory.
Monitor the settings with the v$sql_resize_ops view. I then usually verify that the other parameters that are set in the init.ora are reasonable for their database and also look up any undocumented settings. (If I can! After all, they are undocumented!)
No comments:
Post a Comment