Amit's Oracle DBA Blog: Join Methods

Join Methods – Oracle

Execution plans provide the sequence of operations performed in order to run SQL statements.

•Order of the tables referenced in the statement

•Access method for each table in the statement

–INDEX

–INLIST ITERATOR

–TABLE ACCESS

–VIEW

•Join method in statement accessing multiple tables

–HASH JOIN

–MERGE JOIN

–NESTED LOOPS

•Data manipulations

–CONCATENATION

–COUNT

–FILTER

–SORT

EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0        30
      7876 ADAMS      CLERK           7788 23-MAY-87       1100                    20
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10

14 rows selected.

SQL> select * from dept;

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON

Nested loop Joins

The optimizer uses nested loop joins when joining small number of rows, with a good driving condition between the two tables. It drives from the outer loop to the inner loop. The inner loop is iterated for every row returned from the outer loop, ideally by an index scan. It is inefficient when join returns large number of rows (typically, more than 10,000 rows is considered large), and the optimizer might choose not to use it.

The cost is calculated as below.
cost = access cost of A + (access cost of B * no_of_rows from A)

A nested loop join involves the following steps:

1. The optimizer determines the driving table and designates it as the outer table.
2. The other table is designated as the inner table.
3. For every row in the outer table, Oracle accesses all the rows in the inner table.

SELECT emp.empno, dept.dname FROM EMP , DEPT
WHERE dept.deptno = 10
AND emp.deptno = dept.deptno
/

SQL> SELECT emp.empno, dept.dname FROM EMP , DEPT
WHERE dept.deptno = 10
AND emp.deptno = dept.deptno
/

2    3    4
     EMPNO DNAME
---------- --------------
      7782 ACCOUNTING
      7839 ACCOUNTING
      7934 ACCOUNTING

SQL> S



SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------
Plan hash value: 568005898

----------------------------------------------------------------------------------------
| Id | Operation                    | Name    | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     3 |    60 |     3   (0)| 00:00:01 |
|   1 | NESTED LOOPS                |         |     3 |    60 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    13 |     1   (0)| 00:00:01 |
|* 3 |    INDEX UNIQUE SCAN         | PK_DEPT |     1 |       |     0   (0)| 00:00:01 |
|* 4 |   TABLE ACCESS FULL          | EMP     |     3 |    21 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEPT"."DEPTNO"=10)
   4 - filter("EMP"."DEPTNO"=10)

17 rows selected.

SQL> explain plan for
2 SELECT /*+ USE_NL(emp dept) */ emp.empno, dept.dname
FROM EMP , DEPT WHERE dept.deptno = 10
AND emp.deptno = dept.deptno 3    4
5 /

Explained.

SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 568005898

----------------------------------------------------------------------------------------
| Id | Operation                    | Name    | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     3 |    60 |     3   (0)| 00:00:01 |
|   1 | NESTED LOOPS                |         |     3 |    60 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    13 |     1   (0)| 00:00:01 |
|* 3 |    INDEX UNIQUE SCAN         | PK_DEPT |     1 |       |     0   (0)| 00:00:01 |
|* 4 |   TABLE ACCESS FULL          | EMP     |     3 |    21 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEPT"."DEPTNO"=10)
   4 - filter("EMP"."DEPTNO"=10)

17 rows selected.

Figure 1 below illustrates the method of executing the query shown next where the DEPT table is accessed first and the result is then looped through the EMP table with a NESTED LOOPS join. The type of join performed can be forced with a hint and will vary due to different variables on your system.

Table 1. NESTED LOOPS (DEPT is the driving table)

We can also force the Nested loop join hint as below.

SELECT /*+ USE_NL(emp dept) */ emp.empno, dept.dname

FROM EMP , DEPT WHERE dept.deptno = 10

AND emp.deptno = dept.deptno

Hash Join

Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows. This method is best used when the smaller table fits in available memory. The optimizer uses a hash join to join two tables if they are joined using an equijoin(joins with equals predicates) and large amount of data needs to be joined.

The cost of a Hash loop join is calculated by the following formula:
cost=(access cost of A*no_of_hash partitions of B) + access cost of B

SQL> explain plan for
2 SELECT emp.empno, dept.dname FROM EMP , DEPT
WHERE emp.deptno = dept.deptno; 3

Explained.

SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 615168685

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    14 |   280 |     5 (20)| 00:00:01 |
|* 1 | HASH JOIN         |      |    14 |   280 |     5 (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPT |     4 |    52 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMP |    14 |    98 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")

15 rows selected.

We can also force the Hash join hint as below.
SELECT /*+USE_HASH(emp dept) */ emp.empno, dept.dname
FROM EMP , DEPT
WHERE emp.deptno = dept.deptno

Sort-Merge Join

Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:

1. The row sources are sorted already.
2. A sort operation does not have to be done.

Sort merge joins are almost exclusively used for non-equi joins (>, <, BETWEEN). Sort merge joins perform better than nested loop joins for large data sets. (You cannot use hash joins unless there is an equality condition). In a merge join, there is no concept of a driving table.
The join consists of two steps:
Sort join operation: Both the inputs are sorted on the join key.
Merge join operation: The sorted lists are merged together.

The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:

1. The join condition between two tables is not an equi-join.
2. OPTIMIZER_MODE is set to RULE.
3. HASH_JOIN_ENABLED is false.
4. Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
5. The optimizer thinks that the cost of a hash join is higher, based on the settings of HASH_AREA_SIZE and SORT_AREA_SIZE.

The cost of a sort merge join is calculated by the following formula:
cost = access cost of A + access cost of B + (sort cost of A + sort cost of B)

SQL> explain plan for
2 SELECT emp.empno, dept.dname
FROM EMP , DEPT
WHERE emp.deptno < dept.deptno
/

3    4    5
Explained.

SQL> SQL> SQL>
SQL>
SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 513559243

-----------------------------------------------------------------------------------------
| Id | Operation                     | Name    | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |    28 |   560 |     5 (20)| 00:00:01 |
|   1 | MERGE JOIN                   |         |    28 |   560 |     5 (20)| 00:00:01 |
|   2 |   SORT JOIN                   |         |     4 |    52 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| DEPT    |     4 |    52 |     2   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | PK_DEPT |     4 |       |     1   (0)| 00:00:01 |
|* 5 |   SORT JOIN                   |         |    14 |    98 |     3 (34)| 00:00:01 |
|   6 |    TABLE ACCESS FULL          | EMP     |    14 |    98 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(INTERNAL_FUNCTION("EMP"."DEPTNO")<INTERNAL_FUNCTION("DEPT"."DEPTNO"
              ))
       filter(INTERNAL_FUNCTION("EMP"."DEPTNO")<INTERNAL_FUNCTION("DEPT"."DEPTNO"
              ))

21 rows selected.

SQL>

Cartesian join
A Cartesian join is used when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with all the row from the other data source, creating the Cartesian product of the two sets.

For instance,
scott@DB1.US.ORACLE.COM> select * from emp,dept;

SQL> explain plan for
2 select * from emp,dept;

Explained.

SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
Plan hash value: 2034389985

-----------------------------------------------------------------------------
| Id | Operation            | Name | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |    56 | 3248 |     5   (0)| 00:00:01 |
|   1 | MERGE JOIN CARTESIAN|      |    56 | 3248 |     5   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | DEPT |     4 |    80 |     2   (0)| 00:00:01 |
|   3 |   BUFFER SORT        |      |    14 |   532 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL | EMP |    14 |   532 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------

11 rows selected.

SQL>
SQL>

Amit's Oracle DBA Blog

Disclaimer

Monday, 16 November 2020

Join Methods – Oracle

No comments:

Post a Comment

Understanding SQL Plan Baselines in Oracle Database

Labels