Fully managed cloud. 30-day FREE full featured trial. Start Now
cancel
Showing results for 
Search instead for 
Did you mean: 

What’s the best query trick you are using?

exa-MathiasHo
Exasol Alumni

Hi everyone,

We at Exasol often have a good understanding of what best practices are for our users. However, oftentimes, the real value is only seen by you and sometimes only develops over time. This is why we are interested in hearing what your best query trick is.

It could be a query trick, which only has a small impact on one query but performed often is almost invaluable to you. Or it is a practice you use every day or you use it one-time only with great impact. 

What is your best query trick?


Best regards,

Mathias

12 REPLIES 12

joehl
Contributor

I though twice:

  • sum_distinct encourages lazy joining: wrong and expensive
  • furthermore if we want more statistics we need avg_distinct, max_distinct etc.: lots of implementations and lots of slow lua code running in the query
  • first alternative: let's implement err_distinct(id) which throws an error if we have a wrong join as a QA: limits to one lua call
  • better alternative: let's implement cnt_duplicated(id) which counts the number of errors: is more informative
  • best alternative: COUNT(id) - COUNT(DISTINCT id) or if you allow NULLs: COUNT(*) - COUNT(DISTINCT DUMP(id)) 

Now we are at a native solution, which is what you wanted 🙂

Best

PeterK
Xpert

Hi @joehl  - Some good thoughts there.  

Regarding raising or counting errors, I would be concerned that QA may not detect it because whether it results in a incorrect SUM depends on the data set. You would have to make sure the QA dataset triggers the duplicate rows.

I would like to think that if Exasol handled it natively it would be just as performant as COUNT(DISTINCT) + SUM() ...but I'm not a DB developer 🙂

BTW - I like your solution of using ORDER BY ... that's clever!

joehl
Contributor

Thanks. Of course there is a native version of the order by approach, but I agree that it is quite verbose:

SELECT sum(firstval)
FROM (
    SELECT FIRST_VALUE(val) OVER (PARTITION BY id ORDER BY val) AS firstval
    FROM
    (
        SELECT 1 AS id, 1 AS val
        UNION ALL        
        SELECT 1 AS id, 1 AS val
    )
    WHERE id IS NOT NULL
    QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY val) = 1
)
;

And I still think that QA on the input tables plus writing proper joins is the way to go, in a ETL setting. When exploring data I don't know, I believe that COUNT(DISTINCT)-COUNT(*) is the best way to check everything is fine. Let's not forget that when using SUM_DISTINCT we would make further assumptions that might be wrong: that all instances of one id have the same value. If not, we sum arbitrarily filtered values. Given all that, I understand that EXASOL does not provide SUM_DISTINCT.

Best