Export to multiple files, controle the number of records per file

Gallus
SQL-Fighter

Hello everybody

I am using an export statement to store the records of a table into multiple files. I noticed that the size of the files and also the number of records per file is not equal and varias quit a bit. In the EXASOL documentation I found the following statement

Gallus_0-1625831281402.png

Is there any documentation on the "several factors" or what could I do to make sure that the number of records is more or less evenly distributed.

My query looks like the following (This is generated code, out of Wherescape)

 

EXPORT SELECT  grl_mrt_mzlg_m_syr_update_file_export.fileline AS fileline
  FROM edd_dwh_grl.grl_mrt_mzlg_m_syr_update_file_export grl_mrt_mzlg_m_syr_update_file_export
   INTO CSV AT CON_IG_FTP_MZLG_SYR_SND_DEV FILE 'NC104_PartnerFlagging_DWH_update.txt_1_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_2_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_3_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_4_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_5_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_6_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_7_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_8_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_9_202110709' 
   FILE 'NC104_PartnerFlagging_DWH_update.txt_10_202110709'
   COLUMN SEPARATOR=';'
COLUMN DELIMITER=''
ENCODING='UTF-8'
REPLACE

 

This results in the following

Gallus_1-1625831448994.png

 

Kind Regards

Gallus

 

13 REPLIES 13

mwellbro
Xpert

Hi @exa-Kristof ,

thanks for digging this up - still kind of leaves me wondering what "quite complicated process" entails exactly but I guess this is either venturing into trade secret territory or a flow chart that would make our eyes bleed ? 😉

@Gallus : do you think this helps with your original topic ? Was trying to get a bit deeper into this by setting up an FTP server next to my AWS exa cluster ( which would have allowed for better tracing than my original S3-based way ) but I must have taken a wrong turn somewhere around my own security group architecture 🤔
If I get around to it I´ll post any additional findings here.

Cheers,
Malte

Gallus
SQL-Fighter

Good Morning everybody

Many thanks for all your help. This helps definitely. As we have 17 nodes I am going to check with the recipient whether it would also be possible to deliver 17 files or whether they can deal with different file sizes. I assume they can deal with it, the reason to splizt the data into multiple files is that they can better spread the load over a longer period of time.

Kind Regars

Gallus

mwellbro
Xpert

Hi all,

so in the end I did manage to get my AWS systems to talk ( "a win-ftp-server, an exa-4-node-cluster and a bunch of data walk into a VPC...." ), so here´s how the data flow spread accross connections & ports, for anyone interested:

mwellbro_0-1627057450709.png

The resulting skew as previously shown:

mwellbro_1-1627057523989.png

Won´t bring us any closer to solving the mysteries of how the data gets chosen for each stream but at 
least it offered me some closure 😉

Cheers,
Malte

exa-Kristof
Team Exasol
Team Exasol

Hi @Gallus , of course this a rather technical way to influence things (and admittedly not easy to explain to a department).

But feel free to add an idea if you think that it would help you.

And thank you Malte for your tireless efforts to solve riddles - I guess it's a mixture of your two presumptions why there are not more details on this topic. 😉