07 July, 2020

A Correlated SubQuery

I take this simple Correlated Subquery that identifies all employees whose Salary is greater than the department's average Salary and is also greater than $9,000

SQL> l
  1  select emp.department_id, emp.employee_id, emp.salary
  2  from hr.employees emp
  3  where emp.salary >
  4  (select avg(salary)
  5   from hr.employees emp_inner
  6   where emp.department_id = emp_inner.department_id)
  7  and emp.salary > 9000
  8* order by 1,2
SQL> /

DEPARTMENT_ID EMPLOYEE_ID     SALARY
------------- ----------- ----------
           20         201      13000
           30         114      11000
           80         145      14000
           80         146      13500
           80         147      12000
           80         148      11000
           80         149      10500
           80         150      10000
           80         151       9500
           80         156      10000
           80         157       9500
           80         162      10500
           80         163       9500
           80         168      11500
           80         169      10000
           80         170       9600
           80         174      11000
           90         100      24000
          100         108      12008
          110         205      12008

20 rows selected.

SQL>


The Row Source Statistics for the actual query execution are :

select emp.department_id, emp.employee_id, emp.salary
from hr.employees emp
where emp.salary >
(select avg(salary)
 from hr.employees emp_inner
 where emp.department_id = emp_inner.department_id)
and emp.salary > 9000
order by 1,2

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        3      0.00       0.00          0         12          0          20
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.00       0.00          0         12          0          20

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 106
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
        20         20         20  FILTER  (cr=12 pr=0 pw=0 time=1027 us starts=1)
        23         23         23   SORT GROUP BY (cr=12 pr=0 pw=0 time=1020 us starts=1 cost=7 size=3552 card=111)
       536        536        536    HASH JOIN  (cr=12 pr=0 pw=0 time=917 us starts=1 cost=6 size=71648 card=2239)
        23         23         23     TABLE ACCESS FULL EMPLOYEES (cr=6 pr=0 pw=0 time=30 us starts=1 cost=3 size=1752 card=73)
       108        108        108     TABLE ACCESS FULL EMPLOYEES (cr=6 pr=0 pw=0 time=7 us starts=1 cost=3 size=864 card=108)

********************************************************************************


The Execution Plan is

----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |   111 |  3552 |     7  (15)| 00:00:01 |
|*  1 |  FILTER              |           |       |       |            |          |
|   2 |   SORT GROUP BY      |           |   111 |  3552 |     7  (15)| 00:00:01 |
|*  3 |    HASH JOIN         |           |  2239 | 71648 |     6   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| EMPLOYEES |    73 |  1752 |     3   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| EMPLOYEES |   108 |   864 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("EMP"."SALARY">SUM("SALARY")/COUNT("SALARY"))
   3 - access("EMP"."DEPARTMENT_ID"="EMP_INNER"."DEPARTMENT_ID")
   4 - filter("EMP"."SALARY">9000)


(Since I had only 1 execution of the query when tracing is enabled, "Rows (1st"), "Rows (avg)" and "Rows (max)" report on the same, single, execution)

The EMPLOYEES table underwent two Full Table Scans, each consisting of 6 "consistent reads"  ("cr=6"). 

The execution at line Id=4 expected to return 73 rows ("card=73") but actually returned 23 rows.  This query filtered for SALARY greater than 9000.

The execution at line Id=5 expected to return 108 rows ("card=108") and did actually return 108 rows.  There is no predicate information, meaning that all 108 rows of the table were returned, none filtered.

The join of the outer query and the inner query on DEPARTMENT_ID was the Hash Join at operation Id=3.

Do you note how the FILTER at operation ID=1 appears ?  It is filtering for SALARY greater than AVERAGE SALARY  (where AVERAGE is computed as SUM of SALARY  values divided by the number of not-null SALARY values)



No comments: