Alteryx bulk upload

andykriebel
Padawan

Has anyone been able to bulk upload from alteryx to exasol? ODBC is way, way too slow.

7 REPLIES 7

exa-Carsten
Moderator
Moderator

Hi Andy!

Welcome to the community!

As you've seen, bulk loading with Alteryx into Exasol is not an ideal scenario. You can improve the insert speed with ODBC by tweaking the transaction size parameter (if I remember correctly, the default is set to 5000 or 10000 records) and upping it to 100000 or more, depending on the record width. The problem here is that Alteryx, unlike other ETL tools, doesn't differentiate between transaction size (when does it commit) and batch size (how many records are inserted in one batch). Usually, to get a true all or nothing loading capability, you would set transaction size to unlimited (only commit when all data has been inserted) and batch size to some suitable value for rather large bulks that give you a good performance. Too large batch sizes - and in the case of Alteryx too large transaction sizes - will lead to out of memory errors on the driver side as all data for a batch is first put into memory on the client side before being transferred to the database. 

Matthew Reeve once created a macro to enable Alteryx to use Exasol's bulk loader (IMPORT) to load data. My Alteryx knowledge is very limited, so I cannot tell you in what state this macro is (I would assume: first draft), but maybe it can be of help in your case. Please find it attached!

andykriebel
Padawan

This is helpful Carsten. Thanks!

mwellbro
Xpert

Just to add on what Carsten said and after taking a peek inside the exa_import.zip + getting an idea on how the bulk load in Alteryx is geared: 

It looks like the exa_import dumps to a flat file and uses an IMPORT to load the data whereas the Snowflake Alteryx bulk load uses ( or at least used to use ) an S3 based stage-ing which would allow for much greater loading parallelism than the exa_import.zip example.

If you compare the speeds I think you´d probably want to stage to S3 ( or anything not LOCAL CSV ) as well so that you can exploit the Exasol parallelism accordingly.

exa-Chris
Community Manager
Community Manager

Hi Andy,

depending on what you are planning to do, there is also an Exasol Virtual Schema that can be connected to Snowflake.

Rgds

Christian

Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...

mwellbro
Xpert

Hi @andykriebel ,

just to "get a feeling" , what´s "way, way too slow" in terms of MB/s or RowsPerSec ? Just curious 🙂

Don´t know how Alteryx handles their target buffer size because those do tend to affect throughput, as well as commit interval and the like - also, is/was this measured in an 
onPrem scenario, cloud or hybrid ? ( because latency can rain on your parade ).

 

Cheers,
Malte

andykriebel
Padawan

I'm testing bulk upload time between Exasol and Snowflake via Alteryx (so that I can schedule a workflow that updates the data daily). It's 36 min to upload 28M records into Snowflake via Bulk upload. If it takes longer than that to upload via ODBC, then I won't use it. Last I tested, I stopped the ODBC load at 2 hours and it was 5% done.

exa-Chris
Community Manager
Community Manager

Welcome to the Community Andy great to finally have you here. Let me check internally and get a fast response for you.

Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...