Performance impact of "NVL" on compressed columnar formats


One of the things that immediately impressed me about Exasol all those years ago was the late tuple materialization.  I am pondering whether the effect of the "NVL" function is greatly reduced when evaluating compressed, columnar data.  I.e., is the evaluation done once, for the single NULL value that represents all values in the block that are NULL (assuming that NULL is subject to compression, which I believe it is in this case) rather than once for each column that has as NULL value?


Team Exasol
Team Exasol

Short answer: The NVL function is executed on the decompressed data for each row individually.

Some background for that:
NVL(a,b) currently is executed more or less as CASE WHEN a IS NULL THEN b ELSE a END.
Further, all SQL operations (I am aware of) are executed per ROW, not deep down on a single value for a complete block (even though it sounds like an interesting idea for NVL).

In some cases the operation is executed on the compressed data (but per ROW). This is especially true for simple comparison operations like a = 'selected_value' or num < 10.

For most more complex operations - including CASE - this is currently not done, though it can be done for parts of a CASE, e.g. CASE WHEN a = 'selected_value' THEN ... . However, x IS NULL is also not yet evaluated on the compressed data.

This means, for NVL(a,b) all the parts of the equivalent CASE-statement as well as the CASE-statement itself are executed on uncompressed data per ROW.

Luckily, in most queries data decompression is only one component that contributes to the performance characteristics and the overall difference between compressed comparison and uncompressed comparison does not matter so much.