Disclaimer

Friday 10 December 2021

Histograms in Oracle

###########
Histograms
###########

histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column.

What are Histograms?
-----------------------------
> Holds data about values within a column in a table for the number of occurrences for a specific value/range.
> Used by CBO to optimize a query to use whatever index Fast Full scan or table full scan.
> Usually being used against columns have data being repeated frequently like country or city column.
> gathering histograms on a column having distinct values (PK) is useless because values are not repeated.

> Two types of Histograms can be gathered:
  -Frequency histograms are when distinct values (buckets) in the column is less than 255 (e.g. the number of countries is always less than 254).

  -Height balanced histograms: are similar to frequency histograms in their design, but distinct values bigger than 254.

> Collected by DBMS_STATS (which by default doesn't collect histograms, it deletes them if you didn't use the parameter).
> Mainly being gathered on foreign key columns/columns in WHERE statement.
> Help in SQL multi-table joins.
> Column histograms like statistics are being stored in the data dictionary.
> If the application is exclusive uses bind variables, Oracle recommends deleting any existing histograms and disabling Oracle histograms generation.

Cautions:
   – Do not create histograms on Columns that are not being queried.
   – Do not create histograms on every column of every table.
   – Do not create histograms on the primary key column of a table.

Verify the existence of histograms:
---------------------------------------------
SQL> select column_name,histogram from dba_tab_col_statistics
     where owner='SCOTT' and table_name='EMP'; 

Creating Histograms:
---------------------------
e.g.
SQL> Exec dbms_stats.gather_schema_stats
     (ownname => 'SCOTT',
     estimate_percent => dbms_stats.auto_sample_size,
     method_opt => 'for all columns size auto',
     degree => 7);


method_opt:
FOR COLUMNS SIZE AUTO                 => Fastest. you can specify one column instead of all columns.
FOR ALL COLUMNS SIZE REPEAT     => Prevent deletion of histograms and collect it only for columns already have histograms.
FOR ALL COLUMNS => collect histograms on all columns.
FOR ALL COLUMNS SIZE SKEWONLY => collect histograms for columns have skewed value.
FOR ALL INDEXES COLUMNS      => collect histograms for columns have indexes.

Note: AUTO & SKEWONLY will let Oracle decide whether to create the Histograms or not.

Check the existence of Histograms:
SQL> select column_name, count(*) from dba_tab_histograms
     where OWNER='SCOTT' table_name='EMP' group by column_name; 


Drop Histograms: 11g
----------------------
e.g.
SQL> Exec dbms_stats.delete_column_stats
     (ownname=>'SH', tabname=>'SALES',
     colname=>'PROD_ID', col_stat_type=> HISTOGRAM);


Stop gather Histograms: 11g
------------------------------
[This will change the default table options]
e.g.
SQL> Exec dbms_stats.set_table_prefs
     ('SH', 'SALES','METHOD_OPT', 'FOR ALL COLUMNS SIZE AUTO,FOR COLUMNS SIZE 1 PROD_ID');
>Will continue to collect histograms as usual on all columns in the SALES table except for PROD_ID column.


Drop Histograms: 10g
----------------------
e.g.
SQL> exec dbms_stats.delete_column_stats(user,'T','USERNAME');



Link:

https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/histograms.html




Have you ever wondered how does Oracle determine what type of histograms to create?

As of 12c there are 4 types of histograms. Are you familiar with these types? How does Oracle decide which ones to choose out of the four?

  • frequency
  • top frequency
  • height balanced
  • hybrid

If you need an introduction to histograms, you can check out last week’s post: Intro To Histograms. Three Basic Questions Answered.

Let me first introduce you to each type of histogram, so you have a better understanding.

Frequency Histograms

If the histogram on your column is a frequency histogram, this means that there is a bucket for each distinct value of the column.
Here is an example:

You have the column COLOR, with 4 distinct values: red, blue, green, yellow. Let’s say there are 100 rows in the table that have the color red, 50 rows in the table that have the color blue, 10 rows in the table that have the color green, and 2 rows in the table that have the color yellow.

In this case, there will be 4 buckets in the histogram (one for each distinct value), red bucket with 100 values, blue bucket with 50 values, green bucket with 10 values, and the yellow bucket with 2 values.

For frequency histograms, each bucket can have a different number of values in it, which means some buckets could be more full than others.


Top Frequency Histograms

These type of histograms were introduced with 12c, and are similar to the frequency histograms. The difference is that the top frequency histograms ignore non-popular values that are statistically insignificant.
Let’s look at an example.
We have the column CITY, which has 300 distinct values for table A. This table A has 100,000 rows.
There are 20,000 rows with the column CITY as Toronto, 30,000 rows with the column CITY as Vancouver, 35,000 rows with the column CITY as Calgary. The rest of the 15,000 rows are divided between the rest of the 297 distinct values. However there are a few city values, such as Airdrie and Kelowna with 5 values each.
These 2 values will be considered non-popular values, and will be ignored (there will be no bucket for them).


Height-Balanced Histograms

The Height-Balanced Histograms are legacy type histograms. The column values are divided into buckets, and each bucket contains approximately the same number of rows. These type of histograms are not very “popular” in 12c and up.


Hybrid Histograms

These types of histograms were also introduced with 12c. They are a combination of height-balanced histograms and frequency histograms.
Oracle says it is the “best of both worlds”.

If you want to check whether or not a histogram exists on the columns of a table, query DBA_TAB_COL_STATISTICS.


SQL> select COLUMN_NAME,NUM_DISTINCT,NUM_BUCKETS,HISTOGRAM 
     from DBA_TAB_COL_STATISTICS 
     where table_name='A' and owner='DR' 
     order by column_name;

COLUMN_NAME      NUM_DISTINCT NUM_BUCKETS HISTOGRAM
---------------- ------------ ----------- ---------------
COL_1                     674         254 HYBRID
COL_2                       5           5 FREQUENCY
COL_3                       8           1 NONE
COL_4                    1348           1 NONE
COL_5                    1361         254 HYBRID
COL_6                    1346         254 HYBRID
COL_7                     299         254 TOP-FREQUENCY
COL_8                       5           5 FREQUENCY

8 rows selected.



If you want more information about the histogram itself that you have on a specific column, you can further query DBA_HISTOGRAMS view.
Be aware that there will be one row for each bucket of a histogram in the table:

SQL> select column_name, count(*) 
     from DBA_HISTOGRAMS 
     where table_name='A' and owner='DR' 
     group by column_name 
     order by column_name;

COLUMN_NAME     COUNT(*)
--------------- --------
COL_1                254
COL_2                  5
COL_3                  2
COL_4                  2
COL_5                254
COL_6                254
COL_7                254
COL_8                  5

8 rows selected.



In the above examples, have you noticed that:

  • for frequency histograms, the number of distinct values is equal to the number of buckets.
  • for hybrid histograms the number of distinct values is always greater than the number of buckets.
  • for top-frequency histograms the number of distinct values is always greater than the number of buckets.


Also have you noticed that the maximum number if buckets is 254? You could have more buckets, however this is the default number of buckets that Oracle is using. If you do not specify explicitly the number of buckets when gathering stats, then by default the number of buckets is 254.


The height balanced histograms are not very common in 12c and up. Just running a quick select statement in my 12.1 environment, will prove that I have 0 histogram of type height balanced:

SQL> select count(*), histogram 
     from DBA_TAB_COL_STATISTICS group by histogram;

  COUNT(*) HISTOGRAM
---------- ---------------
       799 FREQUENCY
        71 HYBRID
         2 TOP-FREQUENCY
     32689 NONE

Now that you have a better understanding about histogram types, let’s explore how will Oracle decide which type of histograms to use.

Oracle is using 4 variables to determine the type of histogram to use. These variables are listed below:

1) number of distinct values in the column -> NDV. If the column in the table has the values red, blue, yellow, green, then NDV=4

2) number of histogram buckets -> n. The default value for n is 254. If you don’t explicitly specify the number of buckets, then by default Oracle will set the number of buckets to 254.

3) internal variable, percentage threshold, p = (1–(1/n)) * 100; This is a formula set by Oracle.

4) the value of estimate_percent, when you gather statistics, whether or not it’s set to default value AUTO_SAMPLE_SIZE, when gathering statistics.


👉Here are the rules, set by Oracle when picking a specific type of histogram:


Oracle will pick a Frequency Histogram if:
NDV < n (n=254 by default)
if number of distinct values is less than the number of histogram buckets.


Oracle will pick a Height Balanced Histogram if:
NDV > n (n=254 by default)
estimate_percent < > AUTO_SAMPLE_SIZE
if number of distinct values is greater than the number of histogram buckets
and the estimate percent is not set to AUTO_SAMPLE_SIZE during stats gathering.


Oracle will pick a Hybrid Histogram if:
NDV > n (n=254 by default)
estimate_percent = AUTO_SAMPLE_SIZE
percentage of rows for top n (n=254 by default) frequent values < p
If n=254, then p is 99.6


Oracle will pick a Top Frequency Histogram if
NDV > n (n=254 by default)
estimate_percent = AUTO_SAMPLE_SIZE
percentage of rows for top n (n=254 by default) frequent values >= p






No comments:

Post a Comment

100 Oracle DBA Interview Questions and Answers

  Here are 100 tricky interview questions tailored for a Senior Oracle DBA role. These questions span a wide range of topics, including perf...