Data Lake or Data Warehouse -> Data Lakehouse?

exa-Chris
Community Manager
Community Manager

Dear Community,

in her latest Blog @exa-Helena is discussing the topic of a Data Warehouse vs. Data Lake. We would be really interested to hear first hand from you what you think personally and where you think the industry is going to. (Might not be the same thing) How can we help you to build the right environment.

https://www.exasol.com/resource/data-lake-warehouse-or-lakehouse/

Thank you for your comments

Christian

Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...
3 REPLIES 3

ugamarkj
Xpert

I think it would be hugely beneficial if your data warehouse could serve as a data lake as well. I would very much like to make Exasol my entire strategy for analytical and archived data with seamless integration. Why have Hadoop as a "cheap" archival platform to land slow moving / old data when Exasol could in theory do that too? I'm not expecting the "data lake" part of Exasol to have the same performance expectations. Just carve off a portion of the database to serve as a "lake" as a lower tier of performance with more attractive licensing for that section that could be accessed when needed and integrated with the analytical data. 

+1 vote for an Exasol Data Lakehouse 😀

exa-Chris
Community Manager
Community Manager
Hi, let me run this past some internal folks. Are there any more thoughts out there about this topic? Christian
Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...

Gallus
SQL-Fighter

I somewhat agree, but I am not sure whether this approach would fit all needs.

I would say for historical, structured data I would like to be able to access them using EXASOL, whether the data actually resides in Exasol or in a hadoop cluster. If it resides external to Exasol then the security/access roles must be "propagated" to Exasol.

For unstructured data I am not yet sure whether I want that data also in Exasol. I would however have the need to enhance the unstructured data with data out of Exasol.

Further, there is more to a lake distribution than just data, it's also a question of tools/functions. As this area ist still changing/evolving at a high speed I am not sure whether I want that volatility in our datawarehouse.

Would we also want to have analytics workspaces within the Exasol cluster running on the nodes of our datawarehouses?

I think the discussions are interesting but it will probably end in a compromise. There will not be "one size fits all", it all depends on the scenario.