TPC-H Data Vault test on Exasol 7.0 beta

SQL-Fighter

Hi all,

wanted to share a small performance test, we made with the new Exasol 7.0 beta.

I was very curious how big the impact of the new Hash datatype would be.

 

Conditions:

- Exasol 6.2.7 community edition vs. Exasol 7.0 beta community edition

- both 4GB Ram

- TPC-H scalefactor 1

- persistent Raw Data Vault

- virtualized dimensional model

- query cache disabled

 

These are the results:

ADoerr_0-1594898820158.png

 

The performance improvement for the direct Raw DV queries is already really good,

but the improvement on the virtualized dimensional model is just suprising.

 

 

 

 

 

10 REPLIES 10

Moderator
Moderator

Hi, great post, @ADoerr !

I did the complete 1GB TPCH Benchmark - thanks to @exa-Carsten who provided the DDL and queries recently on the Partner Portal!

My system is a Community Edition Exasol 7.0.rc1, running in a VM with 4 GB memory on my notebook.

Here are my findings:

Environment Total table sizes Runtime for all 22 Queries *
TPCH without Data Vault 217 MB 4 seconds
TPCH with Data Vault using String data types as before 7.0 961 MB 200 seconds
TPCH with Data Vault using the new HASHTYPE 725 MB 70 seconds

 

* Second run of the 22 queries - so this is without the index creation that took place during the first run

Even with Exasol 7, querying the Data Vault model is about 17 times slower than without that model. Okay, my machine isn't a production environment.

But imagine what a performance nightmare that must be with another vendor's database!

If you want to see for yourself, here is all you need to run your own TPCH:

https://uhesse.files.wordpress.com/2020/08/mytpch.zip

That's largely stolen from Carsten - I essentially just added the data files to populate the tables and the IMPORT commands for it.

Isn't it cool that we can import directly from ZIP, by the way? 😎

SQL-Fighter

 

Great! Thanks for this large test!

That really fits my impression working with Data Vault on Exasol.

 

I think Data Vault will never compete against a 3NF model, when it's about quering the data from the model.

But it's the flexbility of the model, which makes Data Vault so attractive.

And Exasol 7.0 with the improved performance made Data Vault even more attractive.

Contributor

 Thank you!  This is very encouraging.

Team Exasol
Team Exasol

Hi, we did indeed make similar experiments internally when developing Hashtype.

For that we created a data vault model using different datatypes for the keys and compared the TPC-H queries on view layer on top of these different models.

sf25.png

sf100.png

Xpert

Did you use the new optimizer or the "old"?

Team Exasol
Team Exasol

TPCH_10GB_DefaultVsExperimental.png

Out of curiosity, I did a little benchmark with the hash type TPCH Data Vault model on 7.0.rc1 using some dedicated hardware and scalefactor 10. While for most queries, the execution times were pretty identical, there are three queries, where performance actually improved (5, 7 & 21), but also three where it degraded (2, 9 and 17). Each query was executed 5 times with each optimizer, the timings in the chart are then the averages of the 2nd - 5th execution (see attached script). I haven't looked at the query profiles yet, that's still on my todo list...

SQL-Fighter

 

I did use the "old" one.
My main intention was to test the influence of new Hash datatype.

Xpert

Hi ADoerr,

this looks really cool ! Did you also test the rest of Q1 - Q22 from the test set and would be able to share the results ? The more data the better 😉

Cheers
Malte

SQL-Fighter

 

No, I just used the mentioned 2 queries at first as this was for a small webinar demo.

Maybe I will get bored at the weekend and test a little more.

Team Exasol
Team Exasol

Maybe @exa-StefanM can share something 😉