Do not use the keyword DISTINCT if the objective can be achieved otherwise. DISTINCT incurs an extra sort operation and therefore slows your queries down.
Main Concept
DISTINCT removes duplicate rows.
Problem:
To remove duplicates, Oracle must:
- sort data
- compare rows
- use memory/TEMP
- spend extra CPU
So unnecessary DISTINCT slows queries.
Simple Understanding
EMP Table
| EMPNO | ENAME | DEPTNO |
|---|---|---|
| 7369 | SMITH | 20 |
| 7499 | ALLEN | 30 |
| 7521 | WARD | 30 |
| 7566 | JONES | 20 |
| 7782 | CLARK | 10 |
| 7839 | KING | 10 |
Scenario 1 — DISTINCT Really Needed
Suppose you want:
Unique department numbers
Query:
SELECT DISTINCT deptno FROM emp;
Result:
10
20
30
Correct usage.
Because many employees belong to same department.
What Oracle Does Internally
Oracle:
- Reads all DEPTNO values
20
30
30
20
10
10
- Sorts them
10
10
20
20
30
30
- Removes duplicates
10
20
30
Explain Plan
EXPLAIN PLAN FOR
SELECT DISTINCT deptno
FROM emp;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan:
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 3 |
| 1 | HASH UNIQUE | | 3 | 3 |
| 2 | TABLE ACCESS FULL|EMP | 14 | 2 |
----------------------------------------------------------------------------
Sometimes Oracle may show:
SORT UNIQUE
Both mean duplicate removal work is happening.
Layman Example
Imagine:
School has 10,000 student forms.
Principal asks:
"Tell me unique city names."
You:
- Collect all city names
- Sort them
- Remove duplicates
Extra effort required.
That is DISTINCT.
Scenario 2 — BAD Use of DISTINCT
Developer writes:
SELECT DISTINCT empno, ename FROM emp;
Problem:
EMPNO is already unique.
Duplicates are impossible.
So DISTINCT is useless.
What Oracle Still Does
Even though duplicates cannot exist,
Oracle STILL performs:
- sorting
- duplicate checking
- memory usage
Waste of resources.
Explain Plan
EXPLAIN PLAN FOR
SELECT DISTINCT empno, ename
FROM emp;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan:
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 4 |
| 1 | HASH UNIQUE | | 14 | 4 |
| 2 | TABLE ACCESS FULL|EMP | 14 | 2 |
----------------------------------------------------------------------------
Notice:
HASH UNIQUE
Extra operation added unnecessarily.
Better Query
SELECT empno, ename FROM emp;
Plan:
----------------------------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| EMP |
----------------------------------------------------------------------------
No sorting.
No duplicate checking.
Much faster.
Scenario 3 — DISTINCT Used to Hide Bad JOIN
Very common production mistake.
Tables
EMP
| EMPNO | ENAME | DEPTNO |
|---|---|---|
| 7369 | SMITH | 20 |
| 7499 | ALLEN | 30 |
BONUS
| EMPNO | BONUS |
|---|---|
| 7369 | 100 |
| 7369 | 200 |
| 7499 | 300 |
Wrong Query
SELECT DISTINCT e.empno, e.ename
FROM emp e, bonus b
WHERE e.empno = b.empno;
Developer uses DISTINCT because join produces duplicates.
Actual Problem
Bad data model or join logic.
DISTINCT is being used like a "duplicate cleaner."
What Happens Internally
Join creates:
7369 SMITH
7369 SMITH
7499 ALLEN
Then DISTINCT sorts and removes duplicates.
Extra unnecessary work.
Better Approach
If requirement is:
- check employee has bonus
Use EXISTS instead.
Optimized Query
SELECT e.empno, e.ename
FROM emp e
WHERE EXISTS (
SELECT 1
FROM bonus b
WHERE b.empno = e.empno
);
Why EXISTS is Better
Oracle stops searching once match found.
No duplicate generation.
No sorting.
No DISTINCT required.
Explain Plan Difference
DISTINCT Query
JOIN
→ HASH UNIQUE
→ SORTING
EXISTS Query
SEMI JOIN
→ No duplicate removal
→ Faster
Real Production Impact
Suppose:
| Rows |
|---|
| EMP = 10 million |
| BONUS = 50 million |
Using DISTINCT may:
- consume huge TEMP space
- spill sorting to disk
- increase CPU
- slow application badly
Important Oracle DBA Logic
Whenever you see:
DISTINCT
Ask:
Why are duplicates coming?
Can query be rewritten?
Can join be corrected?
Can EXISTS be used?
DISTINCT vs GROUP BY
These are often similar:
SELECT DISTINCT deptno FROM emp;
and
SELECT deptno FROM emp GROUP BY deptno;
Both may perform sorting/hashing.
Neither is magically faster always.
Oracle Interview Answer
Interviewer may ask:
Why should DISTINCT be avoided when unnecessary?
Answer:
DISTINCT requires Oracle to perform duplicate elimination using
SORT UNIQUE or HASH UNIQUE operations, which increase CPU,
memory, TEMP usage and query execution time.
Easy Memory Trick
DISTINCT = First create duplicates Then remove duplicates
Better approach: Avoid creating duplicates itself
Golden Rule
Use DISTINCT:
- ONLY when business requirement truly needs unique rows
Do NOT use DISTINCT:
- to hide bad joins
- by habit
- unnecessarily on primary key columns
- as a shortcut fix
Most Important Real DBA Observation
Many slow production queries contain:
SELECT DISTINCT ...
because developers:
- do improper joins
- generate duplicates
- then clean duplicates using DISTINCT
This is one of the most common SQL tuning problems in Oracle.
No comments:
Post a Comment