wanted to share a small performance test, we made with the new Exasol 7.0 beta.
I was very curious how big the impact of the new Hash datatype would be.
- Exasol 6.2.7 community edition vs. Exasol 7.0 beta community edition
- both 4GB Ram
- TPC-H scalefactor 1
- persistent Raw Data Vault
- virtualized dimensional model
- query cache disabled
These are the results:
The performance improvement for the direct Raw DV queries is already really good,
but the improvement on the virtualized dimensional model is just suprising.
Hi, great post, @ADoerr !
I did the complete 1GB TPCH Benchmark - thanks to @exa-Carsten who provided the DDL and queries recently on the Partner Portal!
My system is a Community Edition Exasol 7.0.rc1, running in a VM with 4 GB memory on my notebook.
Here are my findings:
|Environment||Total table sizes||Runtime for all 22 Queries *|
|TPCH without Data Vault||217 MB||4 seconds|
|TPCH with Data Vault using String data types as before 7.0||961 MB||200 seconds|
|TPCH with Data Vault using the new HASHTYPE||725 MB||70 seconds|
* Second run of the 22 queries - so this is without the index creation that took place during the first run
Even with Exasol 7, querying the Data Vault model is about 17 times slower than without that model. Okay, my machine isn't a production environment.
But imagine what a performance nightmare that must be with another vendor's database!
If you want to see for yourself, here is all you need to run your own TPCH:
That's largely stolen from Carsten - I essentially just added the data files to populate the tables and the IMPORT commands for it.
Isn't it cool that we can import directly from ZIP, by the way? 😎
Great! Thanks for this large test!
That really fits my impression working with Data Vault on Exasol.
I think Data Vault will never compete against a 3NF model, when it's about quering the data from the model.
But it's the flexbility of the model, which makes Data Vault so attractive.
And Exasol 7.0 with the improved performance made Data Vault even more attractive.
Out of curiosity, I did a little benchmark with the hash type TPCH Data Vault model on 7.0.rc1 using some dedicated hardware and scalefactor 10. While for most queries, the execution times were pretty identical, there are three queries, where performance actually improved (5, 7 & 21), but also three where it degraded (2, 9 and 17). Each query was executed 5 times with each optimizer, the timings in the chart are then the averages of the 2nd - 5th execution (see attached script). I haven't looked at the query profiles yet, that's still on my todo list...