O Oracle geralmente não sabe que duas sou mais colunas tem relação entre sí. Um exemplo clássico é país e cidade. Ao especificar as duas colunas no where o Oracle vai multiplicar a seletividade das duas colunas para obter uma estimativa sobre a cardinalidade, errando bastante nessa estimativa.
Algo para solucionar esses casos é criar o Extended Statistics para melhorar a estimativa de cardinalidade ao especificar um conjunto de colunas. Veja abaixo na imagem que podemos ter estatísticas tanto para colunas individuais, default, como para o grupo de colunas.
![](https://oraclepress.wordpress.com/wp-content/uploads/2023/11/oracle_extended_statistics.png?w=452)
Vamos ver um exemplo. Vamos pegar a tabela customers do schema SH. Vamos apagar as estatísticas e fazer um SELECT simples nessa tabela:
SQL> EXEC DBMS_STATS.DELETE_TABLE_STATS('SH','CUSTOMERS');
PL/SQL procedure successfully completed.
SQL> COL COLUMN_NAME FORMAT a20
SQL> COL NDV FORMAT 999
SQL> SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM
FROM DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SH'
AND TABLE_NAME = 'CUSTOMERS'
AND COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID');
no rows selected
SQL> SELECT COUNT(*)
FROM sh.customers
WHERE cust_state_province = 'CA';
COUNT(*)
----------
3341
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6zdafycfjtjyk, child number 0
-------------------------------------
SELECT COUNT(*) FROM sh.customers WHERE cust_state_province = 'CA'
Plan hash value: 296924608
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 444 (100)| 1 |00:00:00.01 | 1562 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 1562 |
|* 2 | TABLE ACCESS FULL| CUSTOMERS | 1 | 2613 | 444 (1)| 3341 |00:00:00.01 | 1562 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("CUST_STATE_PROVINCE"='CA')
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Como a tabela não tinha esta estatísticas, o Dynamic Sampling, falei aqui neste post, estimou uma cardinalidade de 2613, quando na verdade foi de 3341. Veja como seria sem o Dynamic Sampling antes de continuarmos.
SQL> alter system set optimizer_dynamic_sampling=0;
System altered.
SQL> SELECT COUNT(*)
FROM sh.customers
WHERE cust_state_province = 'CA';
COUNT(*)
----------
3341
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6zdafycfjtjyk, child number 0
-------------------------------------
SELECT COUNT(*) FROM sh.customers WHERE cust_state_province = 'CA'
Plan hash value: 296924608
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 445 (100)| 1 |00:00:00.01 | 1562 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 1562 |
|* 2 | TABLE ACCESS FULL| CUSTOMERS | 1 | 1331 | 445 (1)| 3341 |00:00:00.01 | 1562 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("CUST_STATE_PROVINCE"='CA')
O erro foi ainda pior. Mas, vamos agora coletar as estatísticas e testar novamente.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','CUSTOMERS',method_opt => 'FOR ALL COLUMNS SIZE REPEAT');
PL/SQL procedure successfully completed.
SQL> SELECT COUNT(*)
FROM sh.customers
WHERE cust_state_province = 'CA'; 2 3
COUNT(*)
----------
3341
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6zdafycfjtjyk, child number 0
-------------------------------------
SELECT COUNT(*) FROM sh.customers WHERE cust_state_province = 'CA'
Plan hash value: 296924608
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 444 (100)| 1 |00:00:00.01 | 1561 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 1561 |
|* 2 | TABLE ACCESS FULL| CUSTOMERS | 1 | 383 | 444 (1)| 3341 |00:00:00.01 | 1561 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("CUST_STATE_PROVINCE"='CA')
Errou bastante, mas isso por que eu não criei um histogram como falei aqui neste post. Vamos criar o histogram e testar novamente:
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','CUSTOMERS',method_opt => 'FOR ALL COLUMNS SIZE AUTO');
PL/SQL procedure successfully completed.
SQL> SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM
FROM DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SH'
AND TABLE_NAME = 'CUSTOMERS'
AND COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID'); 2 3 4 5
COLUMN_NAME NDV HISTOGRAM
-------------------- ---- ---------------
CUST_STATE_PROVINCE 145 FREQUENCY
COUNTRY_ID 19 FREQUENCY
SQL> SELECT COUNT(*)
FROM sh.customers
WHERE cust_state_province = 'CA';
COUNT(*)
----------
3341
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6zdafycfjtjyk, child number 0
-------------------------------------
SELECT COUNT(*) FROM sh.customers WHERE cust_state_province = 'CA'
Plan hash value: 296924608
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 444 (100)| 1 |00:00:00.01 | 3122 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 3122 |
|* 2 | TABLE ACCESS FULL| CUSTOMERS | 1 | 3341 | 444 (1)| 3341 |00:00:00.01 | 3122 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("CUST_STATE_PROVINCE"='CA')
Agora ele acertou, mas, o que será que acontece ao colocar mais uma coluna no where? Vamos testar:
SQL> SELECT count(*) FROM sh.customers WHERE cust_state_province = 'CA' AND country_id=52790;
COUNT(*)
----------
3341
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8hyjyafkdggrn, child number 0
-------------------------------------
SELECT count(*) FROM sh.customers WHERE cust_state_province = 'CA' AND
country_id=52790
Plan hash value: 296924608
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 444 (100)| 1 |00:00:00.01 | 1561 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 1561 |
|* 2 | TABLE ACCESS FULL| CUSTOMERS | 1 | 1115 | 444 (1)| 3341 |00:00:00.01 | 1561 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790))
Mesmo com estatísticas e histograms o Oracle Optimizer errou na estimativa de cardinalidade, 1115 para 3341. Isso aconteceu por que o Optimizer não sabe que essas duas colunas tem relação, que nós sabemos que o country_id dos EUA não iria influência em nada no resultado nessa query. Vamos agora testar criando Extended Statistics.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS( 'SH','CUSTOMERS', METHOD_OPT => 'FOR COLUMNS SIZE AUTO (cust_state_province,country_id)' );
PL/SQL procedure successfully completed.
SQL> SELECT SYS.DBMS_STATS.SHOW_EXTENDED_STATS_NAME( 'sh','customers',
'(cust_state_province,country_id)' ) col_group_name
FROM DUAL; 2 3
COL_GROUP_NAME
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_
SQL> SELECT EXTENSION_NAME, EXTENSION
FROM DBA_STAT_EXTENSIONS
WHERE TABLE_NAME='CUSTOMERS';
EXTENSION_NAME
--------------------------------------------------------------------------------------------------------------------------------
EXTENSION
--------------------------------------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_
("CUST_STATE_PROVINCE","COUNTRY_ID")
SQL> SELECT e.EXTENSION col_group, t.NUM_DISTINCT, t.HISTOGRAM
FROM DBA_STAT_EXTENSIONS e, DBA_TAB_COL_STATISTICS t
WHERE e.EXTENSION_NAME=t.COLUMN_NAME
AND e.TABLE_NAME=t.TABLE_NAME
AND t.TABLE_NAME='CUSTOMERS';
COL_GROUP NUM_DISTINCT HISTOGRAM
-------------------------------------------------------------------------------- ------------ ---------------
("CUST_STATE_PROVINCE","COUNTRY_ID") 145 NONE
SQL> SELECT count(*) FROM sh.customers WHERE cust_state_province = 'CA' AND country_id=52790;
COUNT(*)
----------
3341
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8hyjyafkdggrn, child number 0
-------------------------------------
SELECT count(*) FROM sh.customers WHERE cust_state_province = 'CA' AND
country_id=52790
Plan hash value: 296924608
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 444 (100)| 1 |00:00:00.06 | 1561 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.06 | 1561 |
|* 2 | TABLE ACCESS FULL| CUSTOMERS | 1 | 1115 | 444 (1)| 3341 |00:00:00.06 | 1561 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790))
Perceba que mesmo com a Exteded Statistics criada a estimativa ainda foi errada. Isso por que não criei um histogram para o column group, que nada mais é que uma coluna virtual. Vamos criar o histogram e testar novamente:
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS( 'SH','CUSTOMERS', METHOD_OPT => 'FOR COLUMNS SIZE SKEWONLY (cust_state_province,country_id)' );
PL/SQL procedure successfully completed.
SQL> SELECT e.EXTENSION col_group, t.NUM_DISTINCT, t.HISTOGRAM
FROM DBA_STAT_EXTENSIONS e, DBA_TAB_COL_STATISTICS t
WHERE e.EXTENSION_NAME=t.COLUMN_NAME
AND e.TABLE_NAME=t.TABLE_NAME
AND t.TABLE_NAME='CUSTOMERS'; 2 3 4 5
COL_GROUP NUM_DISTINCT HISTOGRAM
-------------------------------------------------------------------------------- ------------ ---------------
("CUST_STATE_PROVINCE","COUNTRY_ID") 145 FREQUENCY
SQL> SELECT count(*) FROM sh.customers WHERE cust_state_province = 'CA' AND country_id=52790;
COUNT(*)
----------
3341
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS +COST +NOTE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8hyjyafkdggrn, child number 1
-------------------------------------
SELECT count(*) FROM sh.customers WHERE cust_state_province = 'CA' AND
country_id=52790
Plan hash value: 296924608
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 444 (100)| 1 |00:00:00.01 | 1561 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 1561 |
|* 2 | TABLE ACCESS FULL| CUSTOMERS | 1 | 3341 | 444 (1)| 3341 |00:00:00.01 | 1561 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790))
Agora sim, estimativa de cardinalidade correta com Extended Statistics.