Something is off with this query

ronaldvr
SQL-Fighter

I run this query:

 

 

WITH BASE_ARTICLE AS (
   SELECT 
   VBAP.posnr,
   VBAP.MATNR,
   VBAP.ABGRU,
   VBAP.VBELN,
   VEDA_KOP.VENDDAT,
   konv.KNUMH
   FROM ods_sap.vbap VBAP
   JOIN PROJECT_PF_FACTURATIE.VDB_META_001_ARTIKEL ON PROJECT_PF_FACTURATIE.VDB_META_001_ARTIKEL.ARTIKELNUMMER = vbap.matnr
INNER JOIN ods_sap.vbak vbak ON
    vbak.VBELN = VBAP.VBELN 
LEFT JOIN    ods_sap.veda veda_kop ON veda_kop.VBELN = vbap.VBELN
INNER JOIN ods_sap.KONV konv ON
    vbak.KNUMV = konv.KNUMV
    AND VBAP.POSNR = konv.KPOSN
    WHERE
    veda_kop.VPOSN = 0
    AND
        vdb_contract_007 = 'X'     
        AND VBAP.ABGRU IS NULL
    AND vbak.trvog = 4
    AND konv.KRECH IN ('C',
    'N')
    AND konv.kschl IN ( 'ZHI1',
    'ZHI2',
    'ZDEA',
    'ZSER',
    'PR02',
    'PR04',
    'ZVIN',
    'ZVBR' )

)

SELECT COUNT(*) FROM BASE_ARTICLE
 JOIN ods_sap.veda veda_pos ON
    veda_pos.VBELN = BASE_ARTICLE.VBELN
    AND veda_pos.VPOSN = BASE_ARTICLE.posnr
LEFT OUTER JOIN ods_sap.konh konh ON
    konh.KNUMH = BASE_ARTICLE.KNUMH

 

 

Base_article alone returns 186 rows. Running the total query gives me this from EXA_DBA_PROFILE_RUNNING

STMT_ID COMMAND_NAME COMMAND_CLASS PART_ID PART_NAME PART_INFO PART_FINISHED OBJECT_SCHEMA OBJECT_NAME OBJECT_ROWS OUT_ROWS DURATION CPU TEMP_DB_RAM_PEAK NET REMARKS  
30 SELECT DQL 1 COMPILE / EXECUTE FALSE         0.398 10.5 180.5 0.1    
30 SELECT DQL 2 INDEX CREATE EXPRESSION INDEX TRUE S_20_HDA_SAP_SAPABAP1 T_VEDA 935983 935983 0.153 22.1 96 70.1 ExpressionIndex
30 SELECT DQL 3 INDEX CREATE EXPRESSION INDEX TRUE S_20_HDA_SAP_SAPABAP1 T_VEDA 935983 935983 0.121 24.8 102 72.4 ExpressionIndex
30 SELECT DQL 4 INDEX CREATE EXPRESSION INDEX TRUE S_20_HDA_SAP_SAPABAP1 T_VBAK 3059199 3059199 0.592 22.2 435 79.7 ExpressionIndex
30 SELECT DQL 5 INDEX CREATE on REPLICATED table TRUE PROJECT_PF_FACTURATIE VDB_META_001_ARTIKEL 1134 1134 0.007 9.9 99 0.6 LOCAL INDEX (ARTIKELNUMMER)
30 SELECT DQL 6 INDEX CREATE EXPRESSION INDEX TRUE S_20_HDA_SAP_SAPABAP1 T_VBAP 5641002 5641002 0.526 22.7 714.5 140.5 ExpressionIndex
30 SELECT DQL 7 SCAN   FALSE S_20_HDA_SAP_SAPABAP1 T_KONV 56752144 7026924 1.406 22.1 427 90.7 T_KONV(HDA_VALID_TO,HDA_VOIDED,C_MANDT,C_KSCHL,C_KRECH)
30 SELECT DQL 8 OUTER JOIN GLOBAL FALSE S_20_HDA_SAP_SAPABAP1 T_KONH 178621 6869337 0.67       T_KONV(C_KNUMH) => GLOBAL INDEX (C_KNUMH)
30 SELECT DQL 9 JOIN GLOBAL FALSE S_20_HDA_SAP_SAPABAP1 T_VBAP 5641002 779880428 116.172       T_KONV(C_KPOSN) => ExpressionIndex
30 SELECT DQL 10 JOIN on REPLICATED table FALSE PROJECT_PF_FACTURATIE VDB_META_001_ARTIKEL 1134 46529620 50.691       T_VBAP(C_MATNR) => LOCAL INDEX (ARTIKELNUMMER)
30 SELECT DQL 11 JOIN GLOBAL FALSE S_20_HDA_SAP_SAPABAP1 T_VBAK 3059199 41 12.023       T_KONV(C_KNUMV),T_VBAP(C_VBELN) => ExpressionIndex
30 SELECT DQL 12 JOIN GLOBAL FALSE S_20_HDA_SAP_SAPABAP1 T_VEDA 935983 41 0.002       T_VBAP(C_VBELN) => ExpressionIndex
30 SELECT DQL 13 JOIN GLOBAL FALSE S_20_HDA_SAP_SAPABAP1 T_VEDA 935983 32 0.002       T_VBAP(C_VBELN,C_POSNR) => ExpressionIndex
30 SELECT DQL 14 GROUP BY GLOBAL on TEMPORARY table FALSE   tmp_subselect1 0 0 0 9.6 427 49.1    
                                 

 

I am stumped: How can it be that limiting a CTE to 186 rows still results in this? Why do all kinds of tables that have no place in the end query (because they are in the CTE) keep on increasing when I refresh this query?

 

It seems to me as the query ' optimizer'  completely ignores the CTE, and creates another execution plan. What ever it is: how can this be prevented?

1 ACCEPTED SOLUTION

Accepted Solutions

mwellbro
Xpert

update: added link to "New Join Order Optimizer in Exasol"

As far as I understand it the optimizer is trying to do two things here:

1) Avoid the mentioned materalization ( as it is associated with an overhead in terms of processing and DB-RAM required
2) Reduce scans of repeatedly stated objects, like ods_sap.veda

In this case I´d think the filter factor isn´t accurately calculated and so the whole join is being tried to run through
"in one go" - which, given the filter factor of the CTE , is not what we want.

You might also try to remove the ORDER BY FALSE and run your session with the new variant of the optimizer, maybe , that would lead to a similar result ( see New Join Order Optimizer in Exasol  ).

Cheers,
Malte

View solution in original post

9 REPLIES 9

mwellbro
Xpert

Hi @ronaldvr ,

could you do me a favor and put a ORDER BY FALSE at the end of your CTE and post if the phenomenon persists ?

My guess would be that the CTE is merged into the join graph in an attempt to avoid a materialization where it might actually be
better for the materialization to occur.

Am I correct in assuming that the CTE, executed isolated from the whole query , returns the 186 rows rather fast and
you are only seeing the issue once you run the query as a whole ?

Btw.: which version of Exasol is this running on ?

Cheers,
Malte

ronaldvr
SQL-Fighter

Darn! You are right is speeds everything up dramatically (version = 7.08)

Thank you, but how and why??

BTW off topic, but he screen keeps jumping downward everytime I want to answer as if an auto-refresh on the page shifts focus to a field continuously, very annoying)

mwellbro
Xpert

update: added link to "New Join Order Optimizer in Exasol"

As far as I understand it the optimizer is trying to do two things here:

1) Avoid the mentioned materalization ( as it is associated with an overhead in terms of processing and DB-RAM required
2) Reduce scans of repeatedly stated objects, like ods_sap.veda

In this case I´d think the filter factor isn´t accurately calculated and so the whole join is being tried to run through
"in one go" - which, given the filter factor of the CTE , is not what we want.

You might also try to remove the ORDER BY FALSE and run your session with the new variant of the optimizer, maybe , that would lead to a similar result ( see New Join Order Optimizer in Exasol  ).

Cheers,
Malte

View solution in original post

exa-MathiasHo
Community Manager
Community Manager

@mwellbro Thanks for jumping in so fast!

@ronaldvr Could you please write me a private message and tell me a little bit more about your issue? (what browser are you using etc.) I'm always open to suggestions and feedback for the community platform. 🙂

ronaldvr
SQL-Fighter

Hi @mwellbro 

thanks for your fast reply, I had indeed found the query optimizer page but was a bit hesitant since I am fairly new to exasol. I will try that too and report back if it has any effect!

ronaldvr
SQL-Fighter

Hello @mwellbro 

I just tried the new join optimizer setting, and it works indeed without 'order by false' as you said. More interesting (to me) was that the older query I tried to improve by the structure in this query, now runs just as fast as the improved query.

ronaldvr
SQL-Fighter

Hello @mwellbro 

I set it on the database this weekend and this works fine now. I do have some questions that are not clear to me from the page you refer to:

  • Do I need to set the parameter -joinOrderMethod=1 at the database level in the 'Extra Database Parameters' settings, or is it stored and used when I use/set CONTROL SET JOIN OPTIMIZER EXPERIMENTAL; ?
  • Why did the ORDER BY FALSE trick work? That still is not clear to me why exactly that statement has an effect?

Regards

Ronald

mwellbro
Xpert

Hi @ronaldvr ,

1) CONTROL SET JOIN OPTIMIZER EXPERIMENTAL; controls the Optimizer on the session level ( so the new optimizer is only used for the duration of your session ), when you are setting it with the -joinOrderMethod parameter you set it on the database level ( so all sessions will use the new optimizer "per default" ).

2) Any operation that would have forced a materialization would have done "the trick" , ORDER BY FALSE is merely the "cheapest" one in terms of saying "materialize away and don´t bother to sort" ( not that it doesn´t provide it´s own challenges under the right circumstances - no silver bullets here 😉 )
You could have seen the same improvement by doing an actual sort , a group by or an analytical function in your CTE.

Hope this helps ?

Cheers,
Malte

exa-Tobias
Team Exasol
Team Exasol

Hi @ronaldvr,

a little off topic, but I see many expression indexes in your profile. Maybe you could quickly check if the datatypes match for the join columns