Extended Statistics

O Oracle geralmente não sabe que duas sou mais colunas tem relação entre sí. Um exemplo clássico é país e cidade. Ao especificar as duas colunas no where o Oracle vai multiplicar a seletividade das duas colunas para obter uma estimativa sobre a cardinalidade, errando bastante nessa estimativa.

Algo para solucionar esses casos é criar o Extended Statistics para melhorar a estimativa de cardinalidade ao especificar um conjunto de colunas. Veja abaixo na imagem que podemos ter estatísticas tanto para colunas individuais, default, como para o grupo de colunas.

Vamos ver um exemplo. Vamos pegar a tabela customers do schema SH. Vamos apagar as estatísticas e fazer um SELECT simples nessa tabela:

SQL> EXEC DBMS_STATS.DELETE_TABLE_STATS('SH','CUSTOMERS');

PL/SQL procedure successfully completed.

SQL> COL COLUMN_NAME FORMAT a20
SQL> COL NDV FORMAT 999

SQL> SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM
FROM   DBA_TAB_COL_STATISTICS
WHERE  OWNER = 'SH'
AND    TABLE_NAME = 'CUSTOMERS'
AND    COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID');

no rows selected

SQL> SELECT COUNT(*)
FROM   sh.customers
WHERE  cust_state_province = 'CA'; 

  COUNT(*)
----------
      3341

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6zdafycfjtjyk, child number 0
-------------------------------------
SELECT COUNT(*) FROM   sh.customers WHERE  cust_state_province = 'CA'

Plan hash value: 296924608

-------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |   444 (100)|      1 |00:00:00.01 |    1562 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |            |      1 |00:00:00.01 |    1562 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   2613 |   444   (1)|   3341 |00:00:00.01 |    1562 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("CUST_STATE_PROVINCE"='CA')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Como a tabela não tinha esta estatísticas, o Dynamic Sampling, falei aqui neste post, estimou uma cardinalidade de 2613, quando na verdade foi de 3341. Veja como seria sem o Dynamic Sampling antes de continuarmos.

SQL> alter system set optimizer_dynamic_sampling=0;

System altered.

SQL> SELECT COUNT(*)
FROM   sh.customers
WHERE  cust_state_province = 'CA'; 

  COUNT(*)
----------
      3341

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6zdafycfjtjyk, child number 0
-------------------------------------
SELECT COUNT(*) FROM   sh.customers WHERE  cust_state_province = 'CA'

Plan hash value: 296924608

-------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |   445 (100)|      1 |00:00:00.01 |    1562 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |            |      1 |00:00:00.01 |    1562 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   1331 |   445   (1)|   3341 |00:00:00.01 |    1562 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("CUST_STATE_PROVINCE"='CA')

O erro foi ainda pior. Mas, vamos agora coletar as estatísticas e testar novamente.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','CUSTOMERS',method_opt => 'FOR ALL COLUMNS SIZE REPEAT');

PL/SQL procedure successfully completed.

SQL> SELECT COUNT(*)
FROM   sh.customers
WHERE  cust_state_province = 'CA';  2    3

  COUNT(*)
----------
      3341

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6zdafycfjtjyk, child number 0
-------------------------------------
SELECT COUNT(*) FROM   sh.customers WHERE  cust_state_province = 'CA'

Plan hash value: 296924608

-------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |   444 (100)|      1 |00:00:00.01 |    1561 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |            |      1 |00:00:00.01 |    1561 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |    383 |   444   (1)|   3341 |00:00:00.01 |    1561 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("CUST_STATE_PROVINCE"='CA')

Errou bastante, mas isso por que eu não criei um histogram como falei aqui neste post. Vamos criar o histogram e testar novamente:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','CUSTOMERS',method_opt => 'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

SQL> SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM
FROM   DBA_TAB_COL_STATISTICS
WHERE  OWNER = 'SH'
AND    TABLE_NAME = 'CUSTOMERS'
AND    COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID');  2    3    4    5

COLUMN_NAME           NDV HISTOGRAM
-------------------- ---- ---------------
CUST_STATE_PROVINCE   145 FREQUENCY
COUNTRY_ID             19 FREQUENCY

SQL> SELECT COUNT(*)
FROM   sh.customers
WHERE  cust_state_province = 'CA'; 

  COUNT(*)
----------
      3341

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6zdafycfjtjyk, child number 0
-------------------------------------
SELECT COUNT(*) FROM   sh.customers WHERE  cust_state_province = 'CA'

Plan hash value: 296924608

-------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |   444 (100)|      1 |00:00:00.01 |    3122 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |            |      1 |00:00:00.01 |    3122 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3341 |   444   (1)|   3341 |00:00:00.01 |    3122 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("CUST_STATE_PROVINCE"='CA')

Agora ele acertou, mas, o que será que acontece ao colocar mais uma coluna no where? Vamos testar:

SQL> SELECT count(*) FROM sh.customers WHERE  cust_state_province = 'CA' AND country_id=52790;

  COUNT(*)
----------
      3341

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8hyjyafkdggrn, child number 0
-------------------------------------
SELECT count(*) FROM sh.customers WHERE  cust_state_province = 'CA' AND
country_id=52790

Plan hash value: 296924608

-------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |   444 (100)|      1 |00:00:00.01 |    1561 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |            |      1 |00:00:00.01 |    1561 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   1115 |   444   (1)|   3341 |00:00:00.01 |    1561 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790))

Mesmo com estatísticas e histograms o Oracle Optimizer errou na estimativa de cardinalidade, 1115 para 3341. Isso aconteceu por que o Optimizer não sabe que essas duas colunas tem relação, que nós sabemos que o country_id dos EUA não iria influência em nada no resultado nessa query. Vamos agora testar criando Extended Statistics.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS( 'SH','CUSTOMERS', METHOD_OPT => 'FOR COLUMNS SIZE AUTO (cust_state_province,country_id)' );

PL/SQL procedure successfully completed.

SQL> SELECT SYS.DBMS_STATS.SHOW_EXTENDED_STATS_NAME( 'sh','customers',
       '(cust_state_province,country_id)' ) col_group_name
FROM   DUAL;  2    3

COL_GROUP_NAME
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_

SQL> SELECT EXTENSION_NAME, EXTENSION
FROM   DBA_STAT_EXTENSIONS
WHERE  TABLE_NAME='CUSTOMERS'; 

EXTENSION_NAME
--------------------------------------------------------------------------------------------------------------------------------
EXTENSION
--------------------------------------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_
("CUST_STATE_PROVINCE","COUNTRY_ID")


SQL> SELECT e.EXTENSION col_group, t.NUM_DISTINCT, t.HISTOGRAM
FROM   DBA_STAT_EXTENSIONS e, DBA_TAB_COL_STATISTICS t
WHERE  e.EXTENSION_NAME=t.COLUMN_NAME
AND    e.TABLE_NAME=t.TABLE_NAME
AND    t.TABLE_NAME='CUSTOMERS'; 

COL_GROUP                                                                        NUM_DISTINCT HISTOGRAM
-------------------------------------------------------------------------------- ------------ ---------------
("CUST_STATE_PROVINCE","COUNTRY_ID")                                                      145 NONE

SQL> SELECT count(*) FROM sh.customers WHERE  cust_state_province = 'CA' AND country_id=52790;

  COUNT(*)
----------
      3341

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8hyjyafkdggrn, child number 0
-------------------------------------
SELECT count(*) FROM sh.customers WHERE  cust_state_province = 'CA' AND
country_id=52790

Plan hash value: 296924608

-------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |   444 (100)|      1 |00:00:00.06 |    1561 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |            |      1 |00:00:00.06 |    1561 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   1115 |   444   (1)|   3341 |00:00:00.06 |    1561 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790))

Perceba que mesmo com a Exteded Statistics criada a estimativa ainda foi errada. Isso por que não criei um histogram para o column group, que nada mais é que uma coluna virtual. Vamos criar o histogram e testar novamente:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS( 'SH','CUSTOMERS', METHOD_OPT => 'FOR COLUMNS SIZE SKEWONLY (cust_state_province,country_id)' );

PL/SQL procedure successfully completed.

SQL> SELECT e.EXTENSION col_group, t.NUM_DISTINCT, t.HISTOGRAM
FROM   DBA_STAT_EXTENSIONS e, DBA_TAB_COL_STATISTICS t
WHERE  e.EXTENSION_NAME=t.COLUMN_NAME
AND    e.TABLE_NAME=t.TABLE_NAME
AND    t.TABLE_NAME='CUSTOMERS';  2    3    4    5

COL_GROUP                                                                        NUM_DISTINCT HISTOGRAM
-------------------------------------------------------------------------------- ------------ ---------------
("CUST_STATE_PROVINCE","COUNTRY_ID")                                                      145 FREQUENCY

SQL> SELECT count(*) FROM sh.customers WHERE  cust_state_province = 'CA' AND country_id=52790;

  COUNT(*)
----------
      3341

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8hyjyafkdggrn, child number 1
-------------------------------------
SELECT count(*) FROM sh.customers WHERE  cust_state_province = 'CA' AND
country_id=52790

Plan hash value: 296924608

-------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |   444 (100)|      1 |00:00:00.01 |    1561 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |            |      1 |00:00:00.01 |    1561 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3341 |   444   (1)|   3341 |00:00:00.01 |    1561 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790))

Agora sim, estimativa de cardinalidade correta com Extended Statistics.

Deixe um comentário

Este site utiliza o Akismet para reduzir spam. Saiba como seus dados em comentários são processados.