01 November, 2008

Numbers and NULLs

In a recent forums.oracle.com thread I had mistakenly stated that "Therefore, even if 1 row out of a million rows has a NULL, you'd get a SUM result as a NUL" after correctly asserting that "Adding a value to a NULL returns a NULL." I did point out later that although SUM (and AVG) ignores NULLs, a COUNT(*) doesnt.Link

Here is a test case about
a) how Average != Sum/Count if any of the rows contains a NULL.
b) how adding a NULL to a number returns a NULL (and adding a number to a NULL still returns a NULL !)


SQL>
SQL> drop table test_sum_numbers purge;

Table dropped.

SQL>
SQL> REM COL_2 has a NULL in one of 4 rows
SQL> create table test_sum_numbers (col_1 number, col_2 number);

Table created.

SQL> insert into test_sum_numbers values (10, 10);

1 row created.

SQL> insert into test_sum_numbers values (20, 20);

1 row created.

SQL> insert into test_sum_numbers values (30, NULL);

1 row created.

SQL> insert into test_sum_numbers values (40, 40);

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> REM Here we see that SUM/COUNT != AVG because the SUM excluded NULLs but the COUNT of all rows includes NULLs
SQL> select count(*) Num_Rows, sum(col_1) Total_1, avg(col_1) Avg_1, sum(col_2) Total_2, avg(col_2) Avg_2 from test_sum_numbers ;

NUM_ROWS TOTAL_1 AVG_1 TOTAL_2 AVG_2
---------- ---------- ---------- ---------- ----------
4 100 25 70 23.3333333

SQL>
SQL> REM Here we see that we can "adjust" NULLs to 0 so that SUM/COUNT == AVG
SQL> select count(*) Num_Rows, sum(nvl(col_1,0)) Total_1, avg(nvl(col_1,0)) Avg_1, sum(nvl(col_2,0)) Total_2, avg(nvl(col_2,0)) Avg_2 from test_sum_numbers ;

NUM_ROWS TOTAL_1 AVG_1 TOTAL_2 AVG_2
---------- ---------- ---------- ---------- ----------
4 100 25 70 17.5

SQL>
SQL>
SQL> REM What happens if we were to add two sets of columns from two tables ?
SQL> select CNT_1 + CNT_2 Total_Rows, SC_1 + SC_2 Total_Sum, (SC_1+SC_2)/(CNT_1+CNT_2) Is_this_the_Average
2 from
3 (select count(*) CNT_1, sum(col_1) SC_1 from test_sum_numbers table_a),
4 (select count(*) CNT_2, sum(col_2) SC_2 from test_sum_numbers table_b)
5 /

TOTAL_ROWS TOTAL_SUM IS_THIS_THE_AVERAGE
---------- ---------- -------------------
8 170 21.25

SQL>
SQL> REM What happens if we ADD the values for each of the rows
SQL> set serveroutput on size 10000
SQL> variable i number;
SQL>
SQL> declare
2 cursor fetch_cursor is select col_1 from test_sum_numbers;
3 c1 number;
4 agg_1 number;
5 cnv_agg_1 number;
6 begin
7 agg_1 := 0;
8 open fetch_cursor ;
9 loop
10 fetch fetch_cursor into c1;
11 exit when fetch_cursor%NOTFOUND;
12 agg_1 := agg_1 + c1;
13 end loop;
14 dbms_output.put_line('Sum of col_1 is ' agg_1);
15 cnv_agg_1 := nvl(agg_1,0);
16 dbms_output.put_line('Testing agg_1 for NULL : ' cnv_agg_1);
17 end;
18 /
Sum of col_1 is 100
Testing agg_1 for NULL : 100

PL/SQL procedure successfully completed.

SQL>
SQL> declare
2 cursor fetch_cursor is select col_2 from test_sum_numbers;
3 c2 number;
4 agg_2 number;
5 cnv_agg_2 number;
6 begin
7 agg_2 := 0;
8 open fetch_cursor ;
9 loop
10 fetch fetch_cursor into c2;
11 exit when fetch_cursor%NOTFOUND;
12 agg_2 := agg_2 + c2;
13 end loop;
14 dbms_output.put_line('Sum of col_2 is ' agg_2);
15 cnv_agg_2 := nvl(agg_2,0);
16 dbms_output.put_line('Testing agg_2 for NULL : ' cnv_agg_2);
17 end;
18 /
Sum of col_2 is
Testing agg_2 for NULL : 0

PL/SQL procedure successfully completed.

SQL>
SQL> REM So the addition in col_2 returned a NULL ? Let's check it again
SQL>
SQL> declare
2 cursor fetch_cursor is select col_2 from test_sum_numbers order by col_1;
3 c2 number;
4 agg_2 number;
5 cnv_agg_2 number;
6 begin
7 agg_2 := 0;
8 dbms_output.put_line ('Begin aggregation at :' agg_2);
9 open fetch_cursor ;
10 loop
11 fetch fetch_cursor into c2;
12 exit when fetch_cursor%NOTFOUND;
13 dbms_output.put_line('Have read value 'c2);
14 agg_2 := agg_2 + c2;
15 dbms_output.put_line('The aggregation is now : ' agg_2);
16 end loop;
17 dbms_output.put_line('Sum of col_2 is ' agg_2);
18 cnv_agg_2 := nvl(agg_2,0);
19 dbms_output.put_line('Testing agg_2 for NULL : ' cnv_agg_2);
20 end;
21 /
Begin aggregation at :0
Have read value 10
The aggregation is now : 10
Have read value 20
The aggregation is now : 30
Have read value
The aggregation is now :
Have read value 40
The aggregation is now :
Sum of col_2 is
Testing agg_2 for NULL : 0

PL/SQL procedure successfully completed.

SQL>
SQL> REM Addition of the col_2 value was running fine till it hit a NULL
SQL> REM The result of adding a number to a NULL was a NULL
SQL> REM Adding numbers to it after that still kept returning NULL


As robert has clarified in his comment, if I did a select count(col_2) it would ignore the NULL. Therefore, while a count(*) returns 4, a count(col_2) returns 3 (and count(col_1) -- all being NOT NULL values -- returns 4).


3 comments:

robert said...

Hemant, this is one of the cases where it matters what you put between the brackets when using count(). With table and data as yours:

SQL> select count(*) Num_Rows,
2 count(col_1) rows_1, sum(col_1) Total_1, avg(col_1) Avg_1, sum(col_1)/count(col_1) Man_Avg_1,
3 count(col_2) rows_2, sum(col_2) Total_2, avg(col_2) Avg_2, sum(col_2)/count(col_2) Man_avg_2
4 from test_sum_numbers;

NUM_ROWS ROWS_1 TOTAL_1 AVG_1 MAN_AVG_1 ROWS_2 TOTAL_2
---------- ---------- ---------- ---------- ---------- ---------- ----------
AVG_2 MAN_AVG_2
---------- ----------
4 4 100 25 25 3 70
23.3333333 23.3333333

Cheers

robert

Anonymous said...

It's always good to review the basics now and then.

Hemant K Chitale said...

Robert,
Yes, I am aware that count(col_2) ignores NULLs.

Generally when we think of counting the number of rows that qualify we do a count(*) ...
eg select count(*) from table where column='VALUE';

However, I will update my post to clarify that count(column_2) will ignore the NULLs.