Something is off with this query

ronaldvr
Contributor

I run this query:

 

WITH BASE_ARTICLE AS (
   SELECT 
   VBAP.posnr,
   VBAP.MATNR,
   VBAP.ABGRU,
   VBAP.VBELN,
   VEDA_KOP.VENDDAT,
   konv.KNUMH
   FROM ods_sap.vbap VBAP
   JOIN PROJECT_PF_FACTURATIE.VDB_META_001_ARTIKEL ON PROJECT_PF_FACTURATIE.VDB_META_001_ARTIKEL.ARTIKELNUMMER = vbap.matnr
INNER JOIN ods_sap.vbak vbak ON
    vbak.VBELN = VBAP.VBELN 
LEFT JOIN    ods_sap.veda veda_kop ON veda_kop.VBELN = vbap.VBELN
INNER JOIN ods_sap.KONV konv ON
    vbak.KNUMV = konv.KNUMV
    AND VBAP.POSNR = konv.KPOSN
    WHERE
    veda_kop.VPOSN = 0
    AND
        vdb_contract_007 = 'X'     
        AND VBAP.ABGRU IS NULL
    AND vbak.trvog = 4
    AND konv.KRECH IN ('C',
    'N')
    AND konv.kschl IN ( 'ZHI1',
    'ZHI2',
    'ZDEA',
    'ZSER',
    'PR02',
    'PR04',
    'ZVIN',
    'ZVBR' )

)

SELECT COUNT(*) FROM BASE_ARTICLE
 JOIN ods_sap.veda veda_pos ON
    veda_pos.VBELN = BASE_ARTICLE.VBELN
    AND veda_pos.VPOSN = BASE_ARTICLE.posnr
LEFT OUTER JOIN ods_sap.konh konh ON
    konh.KNUMH = BASE_ARTICLE.KNUMH

 

Base_article alone returns 186 rows. Running the total query gives me this from EXA_DBA_PROFILE_RUNNING

STMT_IDCOMMAND_NAMECOMMAND_CLASSPART_IDPART_NAMEPART_INFOPART_FINISHEDOBJECT_SCHEMAOBJECT_NAMEOBJECT_ROWSOUT_ROWSDURATIONCPUTEMP_DB_RAM_PEAKNETREMARKS 
30SELECTDQL1COMPILE / EXECUTEFALSE    0.39810.5180.50.1  
30SELECTDQL2INDEX CREATEEXPRESSION INDEXTRUES_20_HDA_SAP_SAPABAP1T_VEDA9359839359830.15322.19670.1ExpressionIndex
30SELECTDQL3INDEX CREATEEXPRESSION INDEXTRUES_20_HDA_SAP_SAPABAP1T_VEDA9359839359830.12124.810272.4ExpressionIndex
30SELECTDQL4INDEX CREATEEXPRESSION INDEXTRUES_20_HDA_SAP_SAPABAP1T_VBAK305919930591990.59222.243579.7ExpressionIndex
30SELECTDQL5INDEX CREATEon REPLICATED tableTRUEPROJECT_PF_FACTURATIEVDB_META_001_ARTIKEL113411340.0079.9990.6LOCAL INDEX (ARTIKELNUMMER)
30SELECTDQL6INDEX CREATEEXPRESSION INDEXTRUES_20_HDA_SAP_SAPABAP1T_VBAP564100256410020.52622.7714.5140.5ExpressionIndex
30SELECTDQL7SCAN FALSES_20_HDA_SAP_SAPABAP1T_KONV5675214470269241.40622.142790.7T_KONV(HDA_VALID_TO,HDA_VOIDED,C_MANDT,C_KSCHL,C_KRECH)
30SELECTDQL8OUTER JOINGLOBALFALSES_20_HDA_SAP_SAPABAP1T_KONH17862168693370.67   T_KONV(C_KNUMH) => GLOBAL INDEX (C_KNUMH)
30SELECTDQL9JOINGLOBALFALSES_20_HDA_SAP_SAPABAP1T_VBAP5641002779880428116.172   T_KONV(C_KPOSN) => ExpressionIndex
30SELECTDQL10JOINon REPLICATED tableFALSEPROJECT_PF_FACTURATIEVDB_META_001_ARTIKEL11344652962050.691   T_VBAP(C_MATNR) => LOCAL INDEX (ARTIKELNUMMER)
30SELECTDQL11JOINGLOBALFALSES_20_HDA_SAP_SAPABAP1T_VBAK30591994112.023   T_KONV(C_KNUMV),T_VBAP(C_VBELN) => ExpressionIndex
30SELECTDQL12JOINGLOBALFALSES_20_HDA_SAP_SAPABAP1T_VEDA935983410.002   T_VBAP(C_VBELN) => ExpressionIndex
30SELECTDQL13JOINGLOBALFALSES_20_HDA_SAP_SAPABAP1T_VEDA935983320.002   T_VBAP(C_VBELN,C_POSNR) => ExpressionIndex
30SELECTDQL14GROUP BYGLOBAL on TEMPORARY tableFALSE tmp_subselect10009.642749.1  
                 

 

I am stumped: How can it be that limiting a CTE to 186 rows still results in this? Why do all kinds of tables that have no place in the end query (because they are in the CTE) keep on increasing when I refresh this query?

 

It seems to me as the query ' optimizer'  completely ignores the CTE, and creates another execution plan. What ever it is: how can this be prevented?

1 ACCEPTED SOLUTION

mwellbro
Xpert

update: added link to "New Join Order Optimizer in Exasol"

As far as I understand it the optimizer is trying to do two things here:

1) Avoid the mentioned materalization ( as it is associated with an overhead in terms of processing and DB-RAM required
2) Reduce scans of repeatedly stated objects, like ods_sap.veda

In this case I´d think the filter factor isn´t accurately calculated and so the whole join is being tried to run through
"in one go" - which, given the filter factor of the CTE , is not what we want.

You might also try to remove the ORDER BY FALSE and run your session with the new variant of the optimizer, maybe , that would lead to a similar result ( see New Join Order Optimizer in Exasol  ).

Cheers,
Malte

View solution in original post

9 REPLIES 9

exa-Tobias
Team Exasol
Team Exasol

Hi @ronaldvr,

a little off topic, but I see many expression indexes in your profile. Maybe you could quickly check if the datatypes match for the join columns

ronaldvr
Contributor

Hi @mwellbro 

thanks for your fast reply, I had indeed found the query optimizer page but was a bit hesitant since I am fairly new to exasol. I will try that too and report back if it has any effect!

mwellbro
Xpert

Hi @ronaldvr ,

could you do me a favor and put a ORDER BY FALSE at the end of your CTE and post if the phenomenon persists ?

My guess would be that the CTE is merged into the join graph in an attempt to avoid a materialization where it might actually be
better for the materialization to occur.

Am I correct in assuming that the CTE, executed isolated from the whole query , returns the 186 rows rather fast and
you are only seeing the issue once you run the query as a whole ?

Btw.: which version of Exasol is this running on ?

Cheers,
Malte

ronaldvr
Contributor

Darn! You are right is speeds everything up dramatically (version = 7.08)

Thank you, but how and why??

BTW off topic, but he screen keeps jumping downward everytime I want to answer as if an auto-refresh on the page shifts focus to a field continuously, very annoying)

exa-MathiasHo
Community Manager
Community Manager

@mwellbro Thanks for jumping in so fast!

@ronaldvr Could you please write me a private message and tell me a little bit more about your issue? (what browser are you using etc.) I'm always open to suggestions and feedback for the community platform. 🙂

mwellbro
Xpert

update: added link to "New Join Order Optimizer in Exasol"

As far as I understand it the optimizer is trying to do two things here:

1) Avoid the mentioned materalization ( as it is associated with an overhead in terms of processing and DB-RAM required
2) Reduce scans of repeatedly stated objects, like ods_sap.veda

In this case I´d think the filter factor isn´t accurately calculated and so the whole join is being tried to run through
"in one go" - which, given the filter factor of the CTE , is not what we want.

You might also try to remove the ORDER BY FALSE and run your session with the new variant of the optimizer, maybe , that would lead to a similar result ( see New Join Order Optimizer in Exasol  ).

Cheers,
Malte

View solution in original post

ronaldvr
Contributor

Hello @mwellbro 

I set it on the database this weekend and this works fine now. I do have some questions that are not clear to me from the page you refer to:

  • Do I need to set the parameter -joinOrderMethod=1 at the database level in the 'Extra Database Parameters' settings, or is it stored and used when I use/set CONTROL SET JOIN OPTIMIZER EXPERIMENTAL; ?
  • Why did the ORDER BY FALSE trick work? That still is not clear to me why exactly that statement has an effect?

Regards

Ronald

mwellbro
Xpert

Hi @ronaldvr ,

1) CONTROL SET JOIN OPTIMIZER EXPERIMENTAL; controls the Optimizer on the session level ( so the new optimizer is only used for the duration of your session ), when you are setting it with the -joinOrderMethod parameter you set it on the database level ( so all sessions will use the new optimizer "per default" ).

2) Any operation that would have forced a materialization would have done "the trick" , ORDER BY FALSE is merely the "cheapest" one in terms of saying "materialize away and don´t bother to sort" ( not that it doesn´t provide it´s own challenges under the right circumstances - no silver bullets here 😉 )
You could have seen the same improvement by doing an actual sort , a group by or an analytical function in your CTE.

Hope this helps ?

Cheers,
Malte

ronaldvr
Contributor

Hello @mwellbro 

I just tried the new join optimizer setting, and it works indeed without 'order by false' as you said. More interesting (to me) was that the older query I tried to improve by the structure in this query, now runs just as fast as the improved query.