TPC-H Data Vault test on Exasol 7.0 beta

ADoerr
SQL-Fighter

Hi all,

wanted to share a small performance test, we made with the new Exasol 7.0 beta.

I was very curious how big the impact of the new Hash datatype would be.

 

Conditions:

- Exasol 6.2.7 community edition vs. Exasol 7.0 beta community edition

- both 4GB Ram

- TPC-H scalefactor 1

- persistent Raw Data Vault

- virtualized dimensional model

- query cache disabled

 

These are the results:

ADoerr_0-1594898820158.png

 

The performance improvement for the direct Raw DV queries is already really good,

but the improvement on the virtualized dimensional model is just suprising.

 

 

 

 

 

10 REPLIES 10

mwellbro
Xpert

Hi ADoerr,

this looks really cool ! Did you also test the rest of Q1 - Q22 from the test set and would be able to share the results ? The more data the better 😉

Cheers
Malte

ADoerr
SQL-Fighter

 

No, I just used the mentioned 2 queries at first as this was for a small webinar demo.

Maybe I will get bored at the weekend and test a little more.

Charlie
Xpert

Did you use the new optimizer or the "old"?

exa-Matze
Team Exasol
Team Exasol

Maybe @exa-StefanM can share something 😉

ADoerr
SQL-Fighter

 

I did use the "old" one.
My main intention was to test the influence of new Hash datatype.

exa-StefanM
Team Exasol
Team Exasol

Hi, we did indeed make similar experiments internally when developing Hashtype.

For that we created a data vault model using different datatypes for the keys and compared the TPC-H queries on view layer on top of these different models.

sf25.png

sf100.png

NituWakan
Contributor

 Thank you!  This is very encouraging.

exa-Uwe
Moderator
Moderator

Hi, great post, @ADoerr !

I did the complete 1GB TPCH Benchmark - thanks to @exa-Carsten who provided the DDL and queries recently on the Partner Portal!

My system is a Community Edition Exasol 7.0.rc1, running in a VM with 4 GB memory on my notebook.

Here are my findings:

Environment Total table sizes Runtime for all 22 Queries *
TPCH without Data Vault 217 MB 4 seconds
TPCH with Data Vault using String data types as before 7.0 961 MB 200 seconds
TPCH with Data Vault using the new HASHTYPE 725 MB 70 seconds

 

* Second run of the 22 queries - so this is without the index creation that took place during the first run

Even with Exasol 7, querying the Data Vault model is about 17 times slower than without that model. Okay, my machine isn't a production environment.

But imagine what a performance nightmare that must be with another vendor's database!

If you want to see for yourself, here is all you need to run your own TPCH:

https://uhesse.files.wordpress.com/2020/08/mytpch.zip

That's largely stolen from Carsten - I essentially just added the data files to populate the tables and the IMPORT commands for it.

Isn't it cool that we can import directly from ZIP, by the way? 😎

ADoerr
SQL-Fighter

 

Great! Thanks for this large test!

That really fits my impression working with Data Vault on Exasol.

 

I think Data Vault will never compete against a 3NF model, when it's about quering the data from the model.

But it's the flexbility of the model, which makes Data Vault so attractive.

And Exasol 7.0 with the improved performance made Data Vault even more attractive.