Disclaimer

Monday, 25 May 2026

DISTINCT Vs Exists - Oracle Performance Tuning

 

Do not use the keyword DISTINCT if the objective can be achieved otherwise. DISTINCT incurs an extra sort operation and therefore slows your queries down.


Main Concept

DISTINCT removes duplicate rows.

Problem:

To remove duplicates, Oracle must:

  • sort data
  • compare rows
  • use memory/TEMP
  • spend extra CPU

So unnecessary DISTINCT slows queries.



Simple Understanding

EMP Table

EMPNOENAMEDEPTNO
7369SMITH20
7499ALLEN30
7521WARD30
7566JONES20
7782CLARK10
7839KING10


Scenario 1 — DISTINCT Really Needed

Suppose you want:

Unique department numbers

Query:

SELECT DISTINCT deptno FROM emp;

Result:

10
20
30

Correct usage.

Because many employees belong to same department.



What Oracle Does Internally

Oracle:

  1. Reads all DEPTNO values
20
30
30
20
10
10
  1. Sorts them
10
10
20
20
30
30
  1. Removes duplicates
10
20
30

Explain Plan

EXPLAIN PLAN FOR
SELECT DISTINCT deptno
FROM emp;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 3 |
| 1 | HASH UNIQUE | | 3 | 3 |
| 2 | TABLE ACCESS FULL|EMP | 14 | 2 |
----------------------------------------------------------------------------

Sometimes Oracle may show:

SORT UNIQUE

Both mean duplicate removal work is happening.


Layman Example

Imagine:

School has 10,000 student forms.

Principal asks:

"Tell me unique city names."

You:

  1. Collect all city names
  2. Sort them
  3. Remove duplicates

Extra effort required.

That is DISTINCT.




Scenario 2 — BAD Use of DISTINCT

Developer writes:

SELECT DISTINCT empno, ename FROM emp;

Problem:

EMPNO is already unique.

Duplicates are impossible.

So DISTINCT is useless.


What Oracle Still Does

Even though duplicates cannot exist,
Oracle STILL performs:

  • sorting
  • duplicate checking
  • memory usage

Waste of resources.


Explain Plan

EXPLAIN PLAN FOR
SELECT DISTINCT empno, ename
FROM emp;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 4 |
| 1 | HASH UNIQUE | | 14 | 4 |
| 2 | TABLE ACCESS FULL|EMP | 14 | 2 |

----------------------------------------------------------------------------

Notice:

HASH UNIQUE

Extra operation added unnecessarily.


Better Query

SELECT empno, ename FROM emp;

Plan:

----------------------------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| EMP |
----------------------------------------------------------------------------

No sorting.
No duplicate checking.

Much faster.





Scenario 3 — DISTINCT Used to Hide Bad JOIN

Very common production mistake.


Tables

EMP

EMPNOENAMEDEPTNO
7369SMITH20
7499ALLEN30

BONUS

EMPNOBONUS
7369100
7369200
7499300

Wrong Query

SELECT DISTINCT e.empno, e.ename
FROM emp e, bonus b
WHERE e.empno = b.empno;

Developer uses DISTINCT because join produces duplicates.


Actual Problem

Bad data model or join logic.

DISTINCT is being used like a "duplicate cleaner."


What Happens Internally

Join creates:

7369 SMITH
7369 SMITH
7499 ALLEN

Then DISTINCT sorts and removes duplicates.

Extra unnecessary work.


Better Approach

If requirement is:

  • check employee has bonus

Use EXISTS instead.


Optimized Query

SELECT e.empno, e.ename
FROM emp e
WHERE EXISTS (
SELECT 1
FROM bonus b
WHERE b.empno = e.empno
);



Why EXISTS is Better

Oracle stops searching once match found.

No duplicate generation.
No sorting.
No DISTINCT required.


Explain Plan Difference

DISTINCT Query

JOIN
→ HASH UNIQUE
→ SORTING

EXISTS Query

SEMI JOIN
→ No duplicate removal
→ Faster

Real Production Impact

Suppose:

Rows
EMP = 10 million
BONUS = 50 million

Using DISTINCT may:

  • consume huge TEMP space
  • spill sorting to disk
  • increase CPU
  • slow application badly

Important Oracle DBA Logic

Whenever you see:

DISTINCT

Ask:

Why are duplicates coming?
Can query be rewritten?
Can join be corrected?
Can EXISTS be used?

DISTINCT vs GROUP BY

These are often similar:

SELECT DISTINCT deptno FROM emp;

and

SELECT deptno FROM emp GROUP BY deptno;

Both may perform sorting/hashing.

Neither is magically faster always.


Oracle Interview Answer

Interviewer may ask:

Why should DISTINCT be avoided when unnecessary?

Answer:

DISTINCT requires Oracle to perform duplicate elimination using
SORT UNIQUE or HASH UNIQUE operations, which increase CPU,
memory, TEMP usage and query execution time.

Easy Memory Trick

DISTINCT = First create duplicates Then remove duplicates

Better approach: Avoid creating duplicates itself

Golden Rule

Use DISTINCT:

  • ONLY when business requirement truly needs unique rows

Do NOT use DISTINCT:

  • to hide bad joins
  • by habit
  • unnecessarily on primary key columns
  • as a shortcut fix




Most Important Real DBA Observation

Many slow production queries contain:

SELECT DISTINCT ...

because developers:

  • do improper joins
  • generate duplicates
  • then clean duplicates using DISTINCT

This is one of the most common SQL tuning problems in Oracle.




Select ONLY those columns in a query which are required - - Oracle Performance Tuning

 

Select ONLY those columns in a query which are required. Extra columns which are not actually used, incur more I/O on the database and increase network traffic.


Main Concept

Bad practice:

SELECT * FROM emp;

Good practice:

SELECT empno, ename FROM emp;

Why?

Because Oracle reads:

  • more data blocks
  • more memory
  • more network traffic
  • more CPU

when unnecessary columns are selected.


EMP Table Example

Assume EMP table:

EMPNOENAMEJOBMGRHIREDATESALCOMMDEPTNO
7369SMITHCLERK790217-DEC-80800NULL20
7499ALLENSALESMAN769820-FEB-81160030030
7521WARDSALESMAN769822-FEB-81125050030

Scenario 1 — Bad Query Using SELECT *

Suppose application screen only displays:

Employee Number
Employee Name

But developer writes:

SELECT * FROM emp WHERE deptno = 30;

What Oracle Does Internally

Oracle reads ALL columns:

EMPNO
ENAME
JOB
MGR
HIREDATE
SAL
COMM
DEPTNO

Even though application only needs:

EMPNO
ENAME

Layman Example

Imagine:

You go to library and ask:

"I only need 2 pages."

But librarian gives:

  • entire 1000-page book.

You carry:

  • extra weight
  • extra paper
  • extra time

That is exactly what SELECT * does.


Explain Plan for SELECT *

EXPLAIN PLAN FOR
SELECT *
FROM emp
WHERE deptno = 30;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 300 | 3 |
| 1 | TABLE ACCESS FULL| EMP | 3 | 300 | 3 |
----------------------------------------------------------------------------

Notice:

Bytes = 300

Oracle estimates more data movement.




Scenario 2 — Good Query

SELECT empno, ename FROM emp WHERE deptno = 30;

Explain Plan

EXPLAIN PLAN FOR
SELECT empno, ename
FROM emp
WHERE deptno = 30;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 39 | 3 |
| 1 | TABLE ACCESS FULL| EMP | 3 | 39 | 3 |
----------------------------------------------------------------------------

Now notice:

Bytes = 39

instead of:

Bytes = 300

Huge reduction.


Why This Matters in Real Production

Imagine:

Table SizeRows
EMP14
Real Production Table500 million

Now suppose each row contains:

  • CLOB
  • XML
  • JSON
  • large VARCHAR2 columns

Using:

SELECT *

can create:

  • massive I/O
  • network bottleneck
  • slow application
  • TEMP usage
  • high CPU



Scenario 3 — Index Usage Difference

Suppose index exists:

CREATE INDEX emp_idx1 ON emp(deptno, empno, ename);

Good Query

SELECT empno, ename FROM emp WHERE deptno = 30;

Oracle can use:

INDEX FAST FULL SCAN

or

INDEX RANGE SCAN

WITHOUT accessing table.

This is called:

Covering Index

Explain Plan

--------------------------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | INDEX RANGE SCAN | EMP_IDX1 |
--------------------------------------------------------------------------------

Very fast.


But SELECT * Changes Everything

Now query:

SELECT * FROM emp WHERE deptno = 30;

Problem:

Index does NOT contain:

  • JOB
  • SAL
  • HIREDATE
  • COMM

So Oracle must:

  1. Read index
  2. Go back to table

This causes:

TABLE ACCESS BY INDEX ROWID

Explain Plan

--------------------------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP |
| 2 | INDEX RANGE SCAN | EMP_IDX1 |
--------------------------------------------------------------------------------

Extra table lookup = slower.


Layman Understanding of ROWID Lookup

Imagine:

Index says:

Employee data is in cupboard row 25

Now Oracle must physically go to table storage and fetch remaining columns.

Extra travel = extra time.


Real-Time Production Scenario

Suppose application dashboard only shows:

EMPNOENAME

But backend query uses:

SELECT * FROM emp;

If 10,000 users open dashboard:

  • unnecessary GBs of data transferred
  • high network usage
  • slow UI response
  • high database load

Golden Rule for Developers and DBAs

Never use:

SELECT *

in:

  • production code
  • procedures
  • APIs
  • views
  • reports

Always select only required columns.


Oracle Interview Answer

Interviewer may ask:

Why should we avoid SELECT * ?

Answer:

SELECT * reads unnecessary columns, increases logical and physical I/O,
network traffic, CPU usage, memory usage and may prevent Oracle from using
covering indexes efficiently.

Easy Memory Trick

SELECT * = Bring entire luggage

SELECT required columns = Bring only required clothes


Most Important DBA Performance Logic

Performance depends on:

Rows fetched × Columns fetched

Most developers only think about rows.

But columns also matter heavily in Oracle performance.




DISTINCT Vs Exists - Oracle Performance Tuning

  Do not use the keyword DISTINCT if the objective can be achieved otherwise. DISTINCT incurs an extra sort operation and therefore slows yo...