Disclaimer

Sunday, 27 July 2025

Oracle Exadata — Smart Scan in Nutshell

 

Oracle Exadata — “Smart Scan in Nutshell”

The Oracle Exadata Database Machine is an Engineered System, complete with preoptimized and preconfigured Software, Servers, Storage, configured to current best practices, that provides an optimal solution for all database workloads, ranging from scan-intensive Data Warehouse applications to highly concurrent OLTP applications.

It combines Oracle Exadata Storage Server Software is the secret sauce, Oracle Database software, and hardware components to deliver extreme performance in a highly available and highly secure environment.

Traditional Scan:
1.Client wants to identify customers that who received discount more than $500. 

2. The information about these premium customers occupies 5 MB in a 1 Terabyte table. 

3. With traditional storage, all database intelligence resides in the database hosts. 

4. Database server nodes must identify all table extents that may contain requested data. 

5. Partitioning may help to eliminate some extents. 

6. Database server issues I/O requests for all potentially relevant data.

7. Storage server returns all relevant blocks of data to database server, using more I/O bandwidth. 

8. Database server must discard irrelevant data by checking values against selection criteria. 

9. Results are sent to client. 

10. Large use of resources CPU & memory for mapping extents. I/O bandwidth from disk for data which will be wasted.

## Traditional Scan:





Smart Scan Example:

  • Smart Scan completely transparent to applications and users.
  • Database server sends database extents and metadata to Exadata Storage Server cells.
  • Smart Scan processing on the Exadata Storage cells scans data blocks to identify relevant rows and columns.
  • Only relevant rows and columns returned to database server, does not return blocks when Smart Scan used. Will return blocks when appropriate.
  • Database server only has to assembled returned relevant data into result set.

## Smart Scan Concept:


Smart Scan

  • I/O is a main source of bottleneck in modern computing.

  • Smart Scan is designed to reduce the amount of data flowing from Storage Server to Database Server.

  • The database nodes sends query details to the storage cells via a protocol known as iDB (intelligent DataBase).

  • Using this information, the storage cells filters major portion of the data in query processing. Oracle Exadata Storage cells search storage disks with added intelligence about the query and send only the relevant bytes, not all the database blocks, to the database nodes —> hence the term Smart Scan.

  • Only required Data is sent to Database server.

  • No I/O bandwidth wasted or Database CPU.


Smart Scan:

Smart Scan Uses direct reads. Direct reads are not new to Exadata. Direct reads involve reading the data into PGA buffers as opposed to the buffer cache used in for caching data blocks. Direct reads make sense when the ratio of cache to the data to be read is very small. If the cache is very small relative to the data to be read, the buffers are going to be evicted anyways and perhaps at a cost of adversely affecting any OLTP type of applications.

Exadata for DW environments involves scanning large volume of data and returning results in formatted data blocks — should not go to buffer cache.

Smart Scan at work 🡪 Query submitted 🡪 Optimizer makes execution plan 🡪 Full scan access 🡪 Smart Scan eligible 🡪 Smart Scan not performed if query columns include LOBs or other conditions 🡪 Smart scan processing.

Smart Scan at work 🡪 Query submitted 🡪 Optimizer makes execution plan 🡪 Full scan access 🡪 Smart Scan 🡪 Smart scan processing 🡪 Select rows and projected columns returned to PGA.


Smart Scan Process:-


🔢 Step 01 – Query

SELECT cust_id, name FROM bonus WHERE amount > 500;


🔢 Step 02 – Table Extents + Metadata

Table Extents plus meta data sent to Storage Cell.


🔢 Step 03 – Smart Scan

Smart Scan identifies rows & columns that matches within terabyte table.


🔢 Step 04 – Storage Cell sends required rows

Storage Cell sends only required rows (5MB data) to Database Server, with less I/O as compared to traditional scan.


🔢 Step 05 – Consolidated Result Set

Consolidated result set build from all Storage Cell.


🔢 Step 06 – Rows returned to Client

Response sent to client.





How Smart Scan works?
Access specifies which data blocks need to read.
Filter is applied on those read blocks.
Smart Scan filters query information on the storage level using below
Predicate information to filter.
Access — start and stop conditions of the leaf node traversal.
Index — are applied during the leaf node traversal operation.
Column Projection — “filter” for table operations.

Smart Scan Column projection:
A B C D E A B C D E B D
Smart Scan only returns columns requested by query
• Significant reduction in I/O bandwidth 
• Diagram based on SQL query SELECT B, C FROM table a;

Row Filtering Operators/Functions
Predicates →, <, =, !=, <=, >=, IS [NOT] NULL, LIKE, [NOT] BETWEEN, [NOT] IN, EXISTS, IS OF type, NOT, AND, OR. & most SQL functions.

Use below SQL to find operators & functions that benefitted from 
Smart Scan
select * from v$sqlfn_metadata where offloadable = ‘YES’;


Smart Scan Join filtering:
Join filtering for star schemas • Joins large tables to smaller tables • Uses Bloom filters.
A way to indicate membership in a set in a compact way • Bloom filters are used to reduce potential row candidates for join, reducing the data sent to the database server for join processing.


Smart Scan in action:
  • Look for keyword STORAGE in explain plan
  • Statistics for individual statements in below views
  • V$SQL
  • OPTIMIZED_PHY_READ_BYTES — IO_CELL_OFFLOAD_ELIGIBLE_BYTES → Number of eligible bytes for offload processing. If column shows 0 (zero), it means the statement was not off-loadable.
  • IO_INTERCONNECT_BYTES → Number of bytes exchanged between compute nodes and Exadata cells
  • IO_CELL_UNCOMPRESSED_BYTES → Number of uncompressed bytes that are offloaded to/from Exadata cells
  • IO_CELL_OFFLOAD_RETURNED_BYTES → Number of bytes returned by Exadata cells from Smart Scan processing
  • DBA_HIST_SQLSTAT
  • IO_OFFLOAD_ELIG_BYTES_TOTAL → Total number off offload-eligible bytes for SQL_ID across all samples
  • IO_INTERCONNECT_BYTES_TOTAL → Total number off bytes exchanged between compute nodes and Exadata cells across all AWR samples
  • IO_OFFLOAD_RETURNED_BYTES_TOTAL → Number of bytes returned by Exadata cells from Smart Scan processing across all samples
  • IO_OFFLOAD_RETURNED_BYTES_DELTA → Number of bytes returned by Exadata cells from Smart Scan processing for current sample

Exadata is a combined hardware and software platform that includes scale-out Intel x86–64 compute and storage servers.


Exadata is a combined hardware/software product that depends on both components to provide substantial performance improvement over non-Exadata platforms. However, the performance benefits of the software component dwarf the benefits provided by the hardware.

Smart Scan Example 1 🡪
SQL> alter session set cell_offload_processing=false;
Session altered.
Elapsed: 00:00:00.00
SQL> select /*+ gather_plan_statistics monitor statement001 */
2 count(*) from purchase where amt_poid = 1;
COUNT(*)
— — — — —
4001234
Elapsed: 00:00:36.22
SQL> alter session set cell_offload_processing=true;
Session altered.
SQL> select /*+ gather_plan_statistics monitor statement002 */
2 count(*) from purchase where amt_poid = 1;
COUNT(*)
— — — — —
4001234
Elapsed: 00:00:05.10
This example shows the performance of a scan against a single, partitioned table

Column Projection:
The term column projection refers to Exadata’s ability to limit the volume of data transferred between the storage tier and the database tier by only returning columns of interest. That is, those in the select list are necessary for join operations on the database tier. If your query requests five columns from a 100-column table, Exadata can eliminate most of the data that would be returned to the database servers by non-Exadata storage. This feature is a much bigger deal than you might expect, and it can have a very significant impact on response times. Let us see an example:

Smart Scan Example 2 🡪
SQL> alter session set “_serial_direct_read” = always;
Session altered.
Elapsed: 00:00:00.00
SQL> alter session set cell_offload_processing = false;
Session altered.
Elapsed: 00:00:00.00
SQL> select count(distinct seller) from sales;
COUNT(DISTINCT SELLER)
— — — — — — — — — — -
1000
Elapsed: 00:00:53.55
SQL> alter session set cell_offload_processing = true;
Session altered.
Elapsed: 00:00:00.01
SQL> select count(distinct seller) from sales;
COUNT(DISTINCTSELLER)
— — — — — — — — — — -
1000
Elapsed: 00:00:28.84
This example deserves some discussion. To force direct path reads — a prerequisite for Smart Scans — the session parameter _SERIAL_DIRECT_READ is set to ALWAYS (more on that later).


Predicate Filtering:
Smart Scan optimizations is predicate filtering. This term refers to Exadata’s ability to return only rows of interest to the database tier.
Filtering these rows at the storage layer can provide a very significant decrease in the volume of data that must be transferred to the database tier. While this optimization also results in some savings in CPU usage on the database servers,
the biggest advantage is generally the reduction in time needed for the data transfer.

Smart Scan Example 3 🡪
SQL> alter session set cell_offload_processing=false;
Session altered.
Elapsed: 00:00:00.01
SQL> select count(*) from sales;
COUNT(*)
— — — — —
394575180
Elapsed: 00:00:23.17
SQL> alter session set cell_offload_processing=true;
Session altered.
Elapsed: 00:00:00.00
SQL> select count(*) from sales;
COUNT(*)
— — — — —
294575180
Elapsed: 00:00:05.68
SQL> — disable storage indexes
SQL> alter session set “_kcfis_storageidx_disabled”=true;
System altered.
Elapsed: 00:00:00.12
SQL> select count(*) from sales where quantity_sold = 1;
COUNT(*)
— — — — —
3006298
Elapsed: 00:00:02.78

First, offloading is completely disabled using the CELL_OFFLOAD_PROCESSING parameter followed by an execution of a query without a WHERE clause (“predicate”). Without the benefit of offloading, but with the benefits of reading exclusively from Smart Flash Cache, this query took about 23 seconds.






No comments:

Post a Comment

Oracle Exadata

  Hardware Architecture:- Oracle Exadata Database Machine employs a modern scale-out architecture encompassing database compute, storage, ne...