Access RowNr from original file in IMPORT ?

mwellbro
Xpert

Hi all,

suppose you IMPORT an xml file into exasol and need to process it with regard to the original row order - sadly said file does not provide a column or other identifying quality
by which a sort would be possible...

Since the error_table feature does allow us to see the original row number from the source file in case of an error, is there a way to access this information in a "non error case" ?

Looking forward to any feedback,
Malte

P.S.: a bit late for this, but a happy new year to all of you !

1 ACCEPTED SOLUTION

jwarlander
Xpert

Happy New Year! 🙂

So without knowing exactly how your XML import is set up.. Wouldn't you be able to just have an identity column on the table you import into, that will implicitly have increasing numbers automatically generated, and then sort by that?

You may get gaps, so it's not a foolproof way of having row numbers to match with the file -- but you *can* use it for sorting, and then in a view (or subsequent table) just generate a proper row number using ROW_NUMBER(ORDER BY your_identity_column) in an analytical query.

If you keep multiple file imports in your staging table, you'd want to PARTITION BY the filename too, I guess, given that it's available.

View solution in original post

11 REPLIES 11

mwellbro
Xpert

Just accepted @jwarlander original proposal as the solution to this - does seem to do the trick in the way I keep using it so I can live with the little "weirdness" we saw in the tests.

jwarlander
Xpert

Happy New Year! 🙂

So without knowing exactly how your XML import is set up.. Wouldn't you be able to just have an identity column on the table you import into, that will implicitly have increasing numbers automatically generated, and then sort by that?

You may get gaps, so it's not a foolproof way of having row numbers to match with the file -- but you *can* use it for sorting, and then in a view (or subsequent table) just generate a proper row number using ROW_NUMBER(ORDER BY your_identity_column) in an analytical query.

If you keep multiple file imports in your staging table, you'd want to PARTITION BY the filename too, I guess, given that it's available.

View solution in original post

mwellbro
Xpert
Hi @jwarlander ,
thanks for the idea, I'll try that .
Here's to hoping that IMPORT+IDENTITY does preserve original row order - always thought of IDENTITY as a mere "attaching unique numbers to rows" , so preserving input order would not strictly need to be preserved - I can live with the gaps you mentioned, relative order is what I'm after.
Maybe some @exasol would like to chime in ? ( just asking because if it works in the tests I do but it is not "by design" but merely "per chance" I'd rather know upfront 😉 )
Cheers,
Malte

exa-Nico
Community Manager
Community Manager
Just curious - is it an import from local csv or not local? Is it only one file or multiple files at once?
Sports Enthusiast. Database Guy. Member of Team Exasol.
Having trouble? Just let me know!

Please Give Kudos if you find this post useful and also mark this as Accepted Solution if this is the required solution

mwellbro
Xpert
Hi @exa-Nico ,
I can see how what you mentioned might play into this😀
We're dealing with the "easy case": local csv, one file .

exa-Nico
Community Manager
Community Manager

Good! Then I think @jwarlander's approach should work! AFAIK, the file is evaluated row by row, so the identity will at least be generated in the correct order (meaning earlier rows have smaller identities). Since you will just use it to create a real rowid later, then the gaps also aren't a problem. 

Sports Enthusiast. Database Guy. Member of Team Exasol.
Having trouble? Just let me know!

Please Give Kudos if you find this post useful and also mark this as Accepted Solution if this is the required solution

mwellbro
Xpert

Hi guys,

so I finally got around to test the IDENTITY column approach...turns out the results are a wee bit non deterministic ( or I made an error in my test setup , so any pointers appreciated 😄 )  :

 

-- Step 1, let´s get some test data in a sorted fashion
EXPORT (SELECT LEVEL,'text with '||LEVEL,'file1.csv' FROM dual CONNECT BY LEVEL<=10000 ORDER BY level)
INTO LOCAL CSV FILE 'C:\temp\file1.csv'
;

-- Step 2, create a table with identity column
CREATE TABLE test_import(
my_ident decimal(18,0) IDENTITY,
id_from_file DECIMAL(18,0),
some_text varchar(2000),
filename varchar(2000)
);

-- Step 3, IMPORT data from Step 1)
IMPORT INTO test_import(id_from_file,some_text,filename) FROM LOCAL CSV FILE 'C:\temp\file1.csv';

-- Step 4, check for violations in identity order compared to original order
WITH base AS (
SELECT a.*,
lead(id_from_file) OVER (ORDER BY my_ident) ld_id FROM test_import a ORDER BY my_ident
)
SELECT * FROM base 
WHERE ld_id<id_from_file
--ORDER BY 1 desc
--ORDER BY my_ident;
;

-- Step 5 truncate the table
TRUNCATE TABLE test_import;

 

So far so good, the initial IMPORT did indeed produce a ( somewhat gappy, but that´s ok ) sort order which preserved the id_from_file order ( so Step 4 yields an empty result ).

Now, being the paranoid nut that I am , I tried to truncate the table and do the IMPORT again...also fine...just for good measure, once more aaaand: 

mwellbro_0-1611594595037.png

Interessting... now I went about and dropped the table, created it again and found the same behaviour.....
Disconnecting my session and starting fresh got me back to "square one" , so initial IMPORT worked but following ones produces the phenomenon...

I´ll be thinking about this for a while, but as for now: not yet a solution 😀

exa-Nico
Community Manager
Community Manager

I would expect that behavior with TRUNCATE, but if you run a create or replace, I would expect the identity value to be dropped....

You can also re-set the identity using an ALTER TABLE command:

 

ALTER TABLE t ALTER COLUMN id SET IDENTITY 1;

 

https://docs.exasol.com/sql/alter_table(column).htm 

Sports Enthusiast. Database Guy. Member of Team Exasol.
Having trouble? Just let me know!

Please Give Kudos if you find this post useful and also mark this as Accepted Solution if this is the required solution

mwellbro
Xpert

Hi @exa-Nico , thanks for the tip regarding SET IDENTITY, will test that as well.

fyi your link 404´s , there seems to be a trailing blank or some such ( of course copy&paste is an alternative 😄  )

exa-Nico
Community Manager
Community Manager
Fixed it, good catch 😉
Sports Enthusiast. Database Guy. Member of Team Exasol.
Having trouble? Just let me know!

Please Give Kudos if you find this post useful and also mark this as Accepted Solution if this is the required solution

exa-Chris
Community Manager
Community Manager

Welcome back Malte and Happy New Year.

Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...