Yes, I having a little trouble loading data to Exasol.
We have several data files, txt or csv, the issue is that the first line of every file is different to the rest... i mean, I have a data file, but the first line is the control data with 5 or 6 columns, and from from second row and on, is the data.... I have no problem loading data from second rows on, but i need to read the first row and only the first row of every file, to populate a control table on exasol.
Can you help me with this.??
Hey @adiazv so you only want to insert the first row of data? Any you want to ignore the rest of the file?
Does the rest of the file look exactly the same as the first row?
Hi @exa-Nico yes, I only want to insert into a control table the first row of data file, from the second row and on, go to data table...
The File has the first row different from the rest rows.... like:
1 Row :DATAFILE_01_20201120.csv, 2020-11-20, 5, 0
2 Row: 2020-11-20,"ABCD",123,456,"COUNTRY",13413,1324134,2432435,2345245,2345245,23452435
3 Row: 2020-11-20,"ABCD",123,456,"COUNTRY",13413,1324134,2432435,2345245,2345245,23452435
4 Row: 2020-11-20,"ABCD",123,456,"COUNTRY",13413,1324134,2432435,2345245,2345245,23452435
5 Row: 2020-11-20,"ABCD",123,456,"COUNTRY",13413,1324134,2432435,2345245,2345245,23452435
6 Row: 2020-11-20,"ABCD",123,456,"COUNTRY",13413,1324134,2432435,2345245,2345245,23452435
I can load the data (from row 2 to 6) to data table, but i need to load the row 1 and only the row 1 to a control table..
You could use the "ERRORS INTO" clause in the IMPORT statement to do this. Since the columns are different, for the control table, you can specify to import the file and say that any errors will be put into a table instead of the entire statement failing. This will lead to only one row being inserted - the header - because the rest have too many columns. Here's an example:
CREATE TABLE TEST.CONTROL_TABLE (CSV_NAME VARCHAR(2000), DATE_INSERTED DATE, NUMBER_OF_ROWS INT, EXTRA_COLUMN INT); CREATE TABLE TEST.DATA_TABLE (C1 DATE, C2 VARCHAR(20000), C3 INT, C4 INT, C5 VARCHAR(20000), C6 INT, C7 INT, C8 INT, C9 INT, C10 INT, C11 INT); IMPORT INTO TEST.CONTROL_TABLE FROM LOCAL CSV FILE 'C://Temp/test.csv' row separator = 'CRLF' errors into test.error_table replace reject limit unlimited; IMPORT INTO TEST.DATA_TABLE FROM LOCAL CSV FILE 'C://Temp/test.csv' row separator = 'CRLF' skip=1; select * from test.control_table; select * from test.data_table;
The IMPORT documentation gives you more information about how to use it. Basically, you will then perform 2 IMPORTs using the same file, one for the control table, and one for the data. Since you don't care about the errors, you can just delete or ignore the error_table then and all of the entries.
Hope it helps!
Hi @exa-Nico, thats is a great idea.... I have a UDF that do the load... so, i have to modify it a little to load all errors on different tables....
Thanks to all
And I think that is a good idea, modify the Exasol Command "IMPORT" to allow import only few rows of data, like "SKIP=1 or 2 or n", maybe "READ=1, or 10, or 1000"
Hi @skarangi , I have already think in that solution, but I think maybe exist another solution directly on Import command... I mean directly on Exasol Commands.
A new question, is there an option to use "IO." LUA commands in Exasol UDF? or Execute and Powershell Script or SH script directly from UDF?
Yes, we do something similar, we actually have >2'000 small files (10-50 rows) which we want to load all into the same table, each of the files has 2 header rows(1 row column names, 2 row datatype definition). The files reside on an sftp-server.
We have built a python UDF, which merges the different files together, and cuts off the two first lines of each file. In order to do so we had to use some sftp-python libraries, which were not part of the Exasol-Python standard. As a consequence we had to build our own python container