Fully managed cloud. 30-day FREE full featured trial. Start Now
cancel
Showing results for 
Search instead for 
Did you mean: 

Best approach to analyse terrabyte of XML Data

kochjoe
SQL-Fighter

Hi there,

we'd like to analyse a huge volume of text based xml data and do need not all data at once but only a part of it. Based on a sample we've loaded into Exasol we calculated a raw database size of 5TB! My question is if there are architectural alternatives to store the data in a cheaper storage like S3 or even Exasol bucketfs and then access data when needed via custom function.

Any hints and expieriences are appreciated.

3 REPLIES 3

mwellbro
Xpert
Hi @kochjoe,

no experience in that department, but in theory you could put your xml data in S3, create an exasol connection object to the bucket and then create view that use said connection to load the data "on demand" ?
Kind of like a virtual schema, in a sense - of course this might get tricky if you have only one big file, I guess but that aside and if you can live with the "lag" / latency for loading from S3 each time it might be a workable approach.

Cheers,
Malte

Cheers,
Malte

exa-Aleksandr
Team Exasol
Team Exasol

Hi @kochjoe ,

To add on top of what @mwellbro wrote.

BucketFS' main use case are relatively small static files (libraries, DS models etc.). The files also get synchronized across the nodes, so you might end up using 5 TB on every your data node.

kochjoe
SQL-Fighter

Hi @exa-Aleksandr ,

thanks for the clarification of bucketfs main usage and the replication pattern.