Performance of several inserts vs union all

Padawan

Hi,

what gets the better performance in Exasol? Several INSERT statements or one INSERT statement where everything is combined with an UNION ALL?

Best regards

Martin

3 REPLIES 3

Xpert

Hi Martin,

I´d say it depends on a few factors - what comes to mind: if you have very selective filters in your UNION ALL branches and access the same table multiple times you might have better performance with several INSERTs compared to a UNION ALL ( because there is an optimization that might pre-materialize your complete UNION ALL which may or may not be what you want for your given workload ).

As a side note, the UNION ALL approach will probably peak stronger in your TEMP_DB_RAM but that´s not necessarily a problem if your main goal is performance.

Do you have a bit more details as to how many different tables you want to use as source for your INSERT , do you take each table completely or will you apply filters and stuff like that ?

I don´t think a general answer would be helpful here...

Cheers,
Malte

Padawan

Hi Malte,

thank you for your Reply.

I can not give you an exact example, as the statement itself will be parameterized and will be applied to different tables with various sizes, number rows and columns. But a typical example could look like this:

I want to insert from a Table SOURCE into a Table TARGET. The SOURCE Table has twice as many columns as the TARGET Table but a much lower count of rows. TARGET Table might have 50Mio Rows and SOURCE Table around 1Mio Rows. Let's assume the TARGET Table has 100 Columns and the SOURCE Table has 200 Columns. Depending on the exact values in the SOURCE Table I want to insert different Columns from the SOURCE Table to the TARGET. Die Number of different INSERT Statements is always 4. The sum over all inserts will be approximately twice as large as the source table.

DML might look like this:

 

INSERT INTO TARGET (

Select SOURCE_COLUMN_1, SOURCE_COLUMN_2, TARGET_COLUMN_4,... from SOURCE

where SOURCE_COLUMN_1=X AND TARGET_COLUMN_5=Y

UNION ALL

Select SOURCE_COLUMN_5, SOURCE_COLUMN_6, TARGET_COLUMN_2,... from SOURCE

where SOURCE_COLUMN_5=A AND TARGET_COLUMN_8=B

UNION ALL

Select SOURCE_COLUMN_10, SOURCE_COLUMN_6, TARGET_COLUMN_8,... from SOURCE

where SOURCE_COLUMN_5=C AND TARGET_COLUMN_8=D

UNION ALL

Select SOURCE_COLUMN_10, SOURCE_COLUMN_6, TARGET_COLUMN_8,... from SOURCE

where SOURCE_COLUMN_5 IS NULL AND TARGET_COLUMN_8 IS NOT NULL

)

Best regards

Martin

Xpert

Hi Martin,
ok, I think I understand the concept - so you´ll incur multiple "table scans" in either scenario ( i.e. UNION ALL or multiple INSERTS ).

If your main goal is throughput you might opt for a single INSERT where the columns from Source are mapped based on CASE WHEN constructs and their corresponding WHERE counterparts, so you´d only need a single scan and your "insert matrix" would shift according to the CASE blocks.

Haven´t tested it yet, so it´s just a thought ( and readability of your code might suffer somewhat.... ).

Cheers
Malte