Disclaimer

Monday, 25 May 2026

DISTINCT Vs Exists - Oracle Performance Tuning

 

Do not use the keyword DISTINCT if the objective can be achieved otherwise. DISTINCT incurs an extra sort operation and therefore slows your queries down.


Main Concept

DISTINCT removes duplicate rows.

Problem:

To remove duplicates, Oracle must:

  • sort data
  • compare rows
  • use memory/TEMP
  • spend extra CPU

So unnecessary DISTINCT slows queries.



Simple Understanding

EMP Table

EMPNOENAMEDEPTNO
7369SMITH20
7499ALLEN30
7521WARD30
7566JONES20
7782CLARK10
7839KING10


Scenario 1 — DISTINCT Really Needed

Suppose you want:

Unique department numbers

Query:

SELECT DISTINCT deptno FROM emp;

Result:

10
20
30

Correct usage.

Because many employees belong to same department.



What Oracle Does Internally

Oracle:

  1. Reads all DEPTNO values
20
30
30
20
10
10
  1. Sorts them
10
10
20
20
30
30
  1. Removes duplicates
10
20
30

Explain Plan

EXPLAIN PLAN FOR
SELECT DISTINCT deptno
FROM emp;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 3 |
| 1 | HASH UNIQUE | | 3 | 3 |
| 2 | TABLE ACCESS FULL|EMP | 14 | 2 |
----------------------------------------------------------------------------

Sometimes Oracle may show:

SORT UNIQUE

Both mean duplicate removal work is happening.


Layman Example

Imagine:

School has 10,000 student forms.

Principal asks:

"Tell me unique city names."

You:

  1. Collect all city names
  2. Sort them
  3. Remove duplicates

Extra effort required.

That is DISTINCT.




Scenario 2 — BAD Use of DISTINCT

Developer writes:

SELECT DISTINCT empno, ename FROM emp;

Problem:

EMPNO is already unique.

Duplicates are impossible.

So DISTINCT is useless.


What Oracle Still Does

Even though duplicates cannot exist,
Oracle STILL performs:

  • sorting
  • duplicate checking
  • memory usage

Waste of resources.


Explain Plan

EXPLAIN PLAN FOR
SELECT DISTINCT empno, ename
FROM emp;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 4 |
| 1 | HASH UNIQUE | | 14 | 4 |
| 2 | TABLE ACCESS FULL|EMP | 14 | 2 |

----------------------------------------------------------------------------

Notice:

HASH UNIQUE

Extra operation added unnecessarily.


Better Query

SELECT empno, ename FROM emp;

Plan:

----------------------------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| EMP |
----------------------------------------------------------------------------

No sorting.
No duplicate checking.

Much faster.





Scenario 3 — DISTINCT Used to Hide Bad JOIN

Very common production mistake.


Tables

EMP

EMPNOENAMEDEPTNO
7369SMITH20
7499ALLEN30

BONUS

EMPNOBONUS
7369100
7369200
7499300

Wrong Query

SELECT DISTINCT e.empno, e.ename
FROM emp e, bonus b
WHERE e.empno = b.empno;

Developer uses DISTINCT because join produces duplicates.


Actual Problem

Bad data model or join logic.

DISTINCT is being used like a "duplicate cleaner."


What Happens Internally

Join creates:

7369 SMITH
7369 SMITH
7499 ALLEN

Then DISTINCT sorts and removes duplicates.

Extra unnecessary work.


Better Approach

If requirement is:

  • check employee has bonus

Use EXISTS instead.


Optimized Query

SELECT e.empno, e.ename
FROM emp e
WHERE EXISTS (
SELECT 1
FROM bonus b
WHERE b.empno = e.empno
);



Why EXISTS is Better

Oracle stops searching once match found.

No duplicate generation.
No sorting.
No DISTINCT required.


Explain Plan Difference

DISTINCT Query

JOIN
→ HASH UNIQUE
→ SORTING

EXISTS Query

SEMI JOIN
→ No duplicate removal
→ Faster

Real Production Impact

Suppose:

Rows
EMP = 10 million
BONUS = 50 million

Using DISTINCT may:

  • consume huge TEMP space
  • spill sorting to disk
  • increase CPU
  • slow application badly

Important Oracle DBA Logic

Whenever you see:

DISTINCT

Ask:

Why are duplicates coming?
Can query be rewritten?
Can join be corrected?
Can EXISTS be used?

DISTINCT vs GROUP BY

These are often similar:

SELECT DISTINCT deptno FROM emp;

and

SELECT deptno FROM emp GROUP BY deptno;

Both may perform sorting/hashing.

Neither is magically faster always.


Oracle Interview Answer

Interviewer may ask:

Why should DISTINCT be avoided when unnecessary?

Answer:

DISTINCT requires Oracle to perform duplicate elimination using
SORT UNIQUE or HASH UNIQUE operations, which increase CPU,
memory, TEMP usage and query execution time.

Easy Memory Trick

DISTINCT = First create duplicates Then remove duplicates

Better approach: Avoid creating duplicates itself

Golden Rule

Use DISTINCT:

  • ONLY when business requirement truly needs unique rows

Do NOT use DISTINCT:

  • to hide bad joins
  • by habit
  • unnecessarily on primary key columns
  • as a shortcut fix




Most Important Real DBA Observation

Many slow production queries contain:

SELECT DISTINCT ...

because developers:

  • do improper joins
  • generate duplicates
  • then clean duplicates using DISTINCT

This is one of the most common SQL tuning problems in Oracle.




No comments:

Post a Comment

DISTINCT Vs Exists - Oracle Performance Tuning

  Do not use the keyword DISTINCT if the objective can be achieved otherwise. DISTINCT incurs an extra sort operation and therefore slows yo...