09 August, 2023

The FILTER operation for an EXISTS query

 Let's assume a business case where we want to track customers who have ever placed orders with us.

(The business case could be extended to identify customers who have registered with us at least 3 years ago but have not placed any order in the last 12months --- they could either be targets for an advertising or promotions/discount campaign or customers to be retired, depending on the nature of our business and products.  But I'll keep the query simple here).


Name              Null?       Type            
_________________ ___________ _______________ 
CUST_ID           NOT NULL    NUMBER(6)       
CUST_NAME         NOT NULL    VARCHAR2(56)    
CUST_REGN_DATE    NOT NULL    DATE                                                     
SQL> select count(*) from customers;
                                                                                                                     
   COUNT(*)                                                                                                                   
___________                                                                                                                            
        100                                                                                                                                   
                                                                                                                                                   
SQL> desc sales_orders;
                                                                                                                                                                    
Name          Null?       Type          
_____________ ___________ _____________ 
ORDER_ID      NOT NULL    NUMBER(12)    
SALE_DATE                 DATE          
CUST_ID                   NUMBER(6)     
PRODUCT_ID                NUMBER(6)     
QUANTITY                  NUMBER(12)    
SQL> 
SQL> select count(*) from sales_orders;

   COUNT(*) 
___________ 
    1979580 

SQL> select count(distinct(cust_id)) from sales_orders;

   COUNT(DISTINCT(CUST_ID)) 
___________________________ 
                         99 

SQL> 
SQL> create table sale_made_tracker (cust_id number(6,0), sales_made varchar2(3));

Table SALE_MADE_TRACKER created.

SQL> 

Given that we have 100 customers and 1.98million sales orders, we want to add an entry to the SALE_MADE_TRACKER for a customer with a sales order.  However, this is an "on-demand" query that an analyst runs from a front-end, customer-by-customer for specific customers.  Thus, it is not executed as a batch for all 100 customers and 1.98million sales orders.

The developer likes to use Bind Variables and comes up with a query like this :


insert into sale_made_tracker
select cust_id, 'YES'
from customers a
where a.cust_name = :bind_cust_name
and exists
(select '1'
 from customers c, sales_orders s
 where s.cust_id=c.cust_id
 and c.cust_id = a.cust_id)
 

This allows the user to specify a customer name without knowing the cust_id when executing the query.

When testing this in 23c, I've had to add the NO_UNNEST Hint so that I could see the FILTER operation (similar to what I saw in a current 12.1 database):


 SQL> explain plan for
  2  insert into sale_made_tracker
  3  select cust_id, 'YES'
  4  from customers a
  5  where a.cust_name = :bind_cust_name
  6  and exists
  7  (select  '1' 
  8   from customers c, sales_orders s
  9   where s.cust_id=c.cust_id
 10   and c.cust_id = a.cust_id) 
 11  /

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT                                                                                 
_________________________________________________________________________________________________ 
Plan hash value: 1657757803                                                                       
                                                                                                  
----------------------------------------------------------------------------------------------    
| Id  | Operation                | Name              | Rows  | Bytes | Cost (%CPU)| Time     |    
----------------------------------------------------------------------------------------------    
|   0 | INSERT STATEMENT         |                   |     1 |    45 |  2447   (1)| 00:00:01 |    
|   1 |  LOAD TABLE CONVENTIONAL | SALE_MADE_TRACKER |       |       |            |          |    
|   2 |   NESTED LOOPS SEMI      |                   |     1 |    45 |  2447   (1)| 00:00:01 |    
|*  3 |    TABLE ACCESS FULL     | CUSTOMERS         |     1 |    43 |     3   (0)| 00:00:01 |    
|   4 |    VIEW PUSHED PREDICATE | VW_SQ_1           |   208 |   416 |  2444   (1)| 00:00:01 |    
|   5 |     NESTED LOOPS         |                   | 20828 |   528K|  2444   (1)| 00:00:01 |    
|*  6 |      INDEX UNIQUE SCAN   | SYS_C0012587      |     1 |    13 |     1   (0)| 00:00:01 |    
|*  7 |      TABLE ACCESS FULL   | SALES_ORDERS      | 20828 |   264K|  2443   (1)| 00:00:01 |    
----------------------------------------------------------------------------------------------    
                                                                                                  
Predicate Information (identified by operation id):                                               
---------------------------------------------------                                               
                                                                                                  
   3 - filter("A"."CUST_NAME"=:BIND_CUST_NAME)                                                    
   6 - access("C"."CUST_ID"="A"."CUST_ID")                                                        
   7 - filter("S"."CUST_ID"="A"."CUST_ID")                                                        
                                                                                                  
Note                                                                                              
-----                                                                                             

PLAN_TABLE_OUTPUT                                           
___________________________________________________________ 
   - dynamic statistics used: dynamic sampling (level=2)    

25 rows selected. 

SQL> delete plan_table;

15 rows deleted.

SQL> 
SQL> explain plan for
  2  insert into sale_made_tracker
  3  select cust_id, 'YES'
  4  from customers a
  5  where a.cust_name = :bind_cust_name
  6  and exists
  7  (select /*+ NO_UNNEST */ '1' 
  8   from customers c, sales_orders s
  9   where s.cust_id=c.cust_id
 10   and c.cust_id = a.cust_id) 
 11  /

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT                                                                                 
_________________________________________________________________________________________________ 
Plan hash value: 2963538242                                                                       
                                                                                                  
----------------------------------------------------------------------------------------------    
| Id  | Operation                | Name              | Rows  | Bytes | Cost (%CPU)| Time     |    
----------------------------------------------------------------------------------------------    
|   0 | INSERT STATEMENT         |                   |     1 |    43 |     6   (0)| 00:00:01 |    
|   1 |  LOAD TABLE CONVENTIONAL | SALE_MADE_TRACKER |       |       |            |          |    
|*  2 |   FILTER                 |                   |       |       |            |          |    
|*  3 |    TABLE ACCESS FULL     | CUSTOMERS         |     1 |    43 |     3   (0)| 00:00:01 |    
|   4 |    NESTED LOOPS          |                   |     1 |    26 |     3   (0)| 00:00:01 |    
|*  5 |     INDEX UNIQUE SCAN    | SYS_C0012587      |     1 |    13 |     1   (0)| 00:00:01 |    
|*  6 |     TABLE ACCESS FULL    | SALES_ORDERS      | 20828 |   264K|     2   (0)| 00:00:01 |    
----------------------------------------------------------------------------------------------    
                                                                                                  
Predicate Information (identified by operation id):                                               
---------------------------------------------------                                               
                                                                                                  
   2 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "SALES_ORDERS" "S","CUSTOMERS"              
              "C" WHERE "C"."CUST_ID"=:B1 AND "S"."CUST_ID"=:B2))                                 
   3 - filter("A"."CUST_NAME"=:BIND_CUST_NAME)                                                    
   5 - access("C"."CUST_ID"=:B1)                                                                  
   6 - filter("S"."CUST_ID"=:B1)                                                                  
                                                                                                  
Note                                                                                              

PLAN_TABLE_OUTPUT                                           
___________________________________________________________ 
-----                                                       
   - dynamic statistics used: dynamic sampling (level=2)    

26 rows selected. 

SQL> delete plan_table;

7 rows deleted.

SQL> 

So, for the purpose of this demo of the FILTER operation for an EXISTS clause, I will use the second form of the query.



If I run the query for customer name "HEMANT" :

 SQL> exec :bind_cust_name := 'HEMANT'

PL/SQL procedure successfully completed.

SQL> 
SQL> insert /*+ gather_plan_statistics */ into sale_made_tracker
  2  select cust_id, 'YES'
  3  from customers a
  4  where a.cust_name = :bind_cust_name
  5  and exists
  6  (select /*+ NO_UNNEST */ '1' 
  7   from customers c, sales_orders s
  8   where s.cust_id=c.cust_id
  9   and c.cust_id = a.cust_id) 
 10  /

1 row inserted.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                           
___________________________________________________________________________________________________________ 
SQL_ID  1j3jggfsurwg4, child number 0                                                                       
-------------------------------------                                                                       
insert /*+ gather_plan_statistics */ into sale_made_tracker select                                          
cust_id, 'YES' from customers a where a.cust_name = :bind_cust_name and                                     
exists (select /*+ NO_UNNEST */ '1'   from customers c, sales_orders s                                      
where s.cust_id=c.cust_id  and c.cust_id = a.cust_id)                                                       
                                                                                                            
Plan hash value: 2963538242                                                                                 
                                                                                                            
--------------------------------------------------------------------------------------------------------    
| Id  | Operation                | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
--------------------------------------------------------------------------------------------------------    
|   0 | INSERT STATEMENT         |                   |      1 |        |      0 |00:00:00.01 |     195 |    
|   1 |  LOAD TABLE CONVENTIONAL | SALE_MADE_TRACKER |      1 |        |      0 |00:00:00.01 |     195 |    
|*  2 |   FILTER                 |                   |      1 |        |      1 |00:00:00.01 |     193 |    
|*  3 |    TABLE ACCESS FULL     | CUSTOMERS         |      1 |      1 |      1 |00:00:00.01 |       7 |    
|   4 |    NESTED LOOPS          |                   |      1 |      1 |      1 |00:00:00.01 |     186 |    
|*  5 |     INDEX UNIQUE SCAN    | SYS_C0012587      |      1 |      1 |      1 |00:00:00.01 |       1 |    
|*  6 |     TABLE ACCESS FULL    | SALES_ORDERS      |      1 |  20828 |      1 |00:00:00.01 |     185 |    
--------------------------------------------------------------------------------------------------------    
                                                                                                            
Predicate Information (identified by operation id):                                                         
---------------------------------------------------                                                         
                                                                                                            

PLAN_TABLE_OUTPUT                                           
___________________________________________________________ 
   2 - filter( IS NOT NULL)                                 
   3 - filter("A"."CUST_NAME"=:BIND_CUST_NAME)              
   5 - access("C"."CUST_ID"=:B1)                            
   6 - filter("S"."CUST_ID"=:B1)                            
                                                            
Note                                                        
-----                                                       
   - dynamic statistics used: dynamic sampling (level=2)    
                                                            

33 rows selected. 

SQL> 
Which shows that a total of 185 blocks were read from the SALES_ORDERS table for "HEMANT"'s orders.  Plus 1 block for the Index Unique Scan and 7 blocks for the Full Table Scan of CUSOMERS (totally 193 blocks for the SELECT) {Ignore the A-Rows being 0 for the actual INSERT in the Plan output, the execution shows that 1 row was inserted)



But if I run the query for  customer name "LARRY" :
SQL> exec :bind_cust_name := 'LARRY'

PL/SQL procedure successfully completed.

SQL> 
SQL> insert /*+ gather_plan_statistics */ into sale_made_tracker
  2  select cust_id, 'YES'
  3  from customers a
  4  where a.cust_name = :bind_cust_name
  5  and exists
  6  (select /*+ NO_UNNEST */ '1' 
  7   from customers c, sales_orders s
  8   where s.cust_id=c.cust_id
  9   and c.cust_id = a.cust_id) 
 10  /

0 rows inserted.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                           
___________________________________________________________________________________________________________ 
SQL_ID  1j3jggfsurwg4, child number 0                                                                       
-------------------------------------                                                                       
insert /*+ gather_plan_statistics */ into sale_made_tracker select                                          
cust_id, 'YES' from customers a where a.cust_name = :bind_cust_name and                                     
exists (select /*+ NO_UNNEST */ '1'   from customers c, sales_orders s                                      
where s.cust_id=c.cust_id  and c.cust_id = a.cust_id)                                                       
                                                                                                            
Plan hash value: 2963538242                                                                                 
                                                                                                            
--------------------------------------------------------------------------------------------------------    
| Id  | Operation                | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
--------------------------------------------------------------------------------------------------------    
|   0 | INSERT STATEMENT         |                   |      1 |        |      0 |00:00:00.06 |    8723 |    
|   1 |  LOAD TABLE CONVENTIONAL | SALE_MADE_TRACKER |      1 |        |      0 |00:00:00.06 |    8723 |    
|*  2 |   FILTER                 |                   |      1 |        |      0 |00:00:00.06 |    8723 |    
|*  3 |    TABLE ACCESS FULL     | CUSTOMERS         |      1 |      1 |      1 |00:00:00.01 |       7 |    
|   4 |    NESTED LOOPS          |                   |      1 |      1 |      0 |00:00:00.06 |    8716 |    
|*  5 |     INDEX UNIQUE SCAN    | SYS_C0012587      |      1 |      1 |      1 |00:00:00.01 |       1 |    
|*  6 |     TABLE ACCESS FULL    | SALES_ORDERS      |      1 |  20828 |      0 |00:00:00.06 |    8715 |    
--------------------------------------------------------------------------------------------------------    
                                                                                                            
Predicate Information (identified by operation id):                                                         
---------------------------------------------------                                                         
                                                                                                            

PLAN_TABLE_OUTPUT                                           
___________________________________________________________ 
   2 - filter( IS NOT NULL)                                 
   3 - filter("A"."CUST_NAME"=:BIND_CUST_NAME)              
   5 - access("C"."CUST_ID"=:B1)                            
   6 - filter("S"."CUST_ID"=:B1)                            
                                                            
Note                                                        
-----                                                       
   - dynamic statistics used: dynamic sampling (level=2)    
                                                            

33 rows selected. 

SQL>

Which shows that a total of 8,715 blocks were read for "LARRY"'s orders from the SALES_ORDERS table alone but actual rows returned was 0 -- so no rows were inserted into SALE_MADE_TRACKER.

Both "HEMANT" and "LARRY" exist in the CUSTOMERS table.


SQL> select cust_id, cust_name 
  2  from customers
  3  where cust_id in (1,2)
  4* /

   CUST_ID CUST_NAME    
__________ ____________ 
         1 HEMANT       
         2 LARRY        

SQL>
SQL> select cust_id, count(*)
  2   from sales_orders
  3  where cust_id in (1,2)
  4  group by cust_id
  5  order by 1
  6* /

   CUST_ID    COUNT(*) 
__________ ___________ 
         1       20502 

SQL> 
SQL> select /*+ gather_plan_statistics FULL (s) */ count(*)
  2  from sales_orders s
  3* /

   COUNT(*) 
___________ 
    1979580 

SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                
________________________________________________________________________________________________ 
SQL_ID  042jxgv8y5xny, child number 0                                                            
-------------------------------------                                                            
select /*+ gather_plan_statistics FULL (s) */ count(*) from                                      
sales_orders s                                                                                   
                                                                                                 
Plan hash value: 1409771706                                                                      
                                                                                                 
---------------------------------------------------------------------------------------------    
| Id  | Operation          | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
---------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |              |      1 |        |      1 |00:00:00.03 |    8715 |    
|   1 |  SORT AGGREGATE    |              |      1 |      1 |      1 |00:00:00.03 |    8715 |    
|   2 |   TABLE ACCESS FULL| SALES_ORDERS |      1 |   2082K|   1979K|00:00:00.02 |    8715 |    
---------------------------------------------------------------------------------------------    
                                                                                                 
Note                                                                                             
-----                                                                                            
   - dynamic statistics used: dynamic sampling (level=2)                                         
                                                                                                 

19 rows selected. 

SQL> 

While "HEMANT" has 20,502 sales orders, "LARRY", despite being a registered customer, has not issued any order.

The EXISTS query for "HEMANT" had to read only 185 blocks from the SALES_ORDERS table until it was *stopped* by the FILTER operation because the FILTER here stops the subquery as soon as the first row is returned (which you can also see as actual rows being "1" from the SALES_ORDERS table) -- as soon as the subquery returns TRUE.

"LARRY" has no orders so the EXISTS query ran through the whole SALES_ORDERS table to read 8,715 blocks.


Side note on the use of the Bind Variable here :  The fact that the developer used Bind Variables means that the Estimated Rows from the SALES_ORDERS table  in the Execution Plan was always going to be the same -- no matter which customer name was queried for -- because it prevents the use of Histograms (if they existed) on the tale.