suppose you IMPORT an xml file into exasol and need to process it with regard to the original row order - sadly said file does not provide a column or other identifying quality
by which a sort would be possible...
Since the error_table feature does allow us to see the original row number from the source file in case of an error, is there a way to access this information in a "non error case" ?
Looking forward to any feedback,
P.S.: a bit late for this, but a happy new year to all of you !
Happy New Year! 🙂
So without knowing exactly how your XML import is set up.. Wouldn't you be able to just have an identity column on the table you import into, that will implicitly have increasing numbers automatically generated, and then sort by that?
You may get gaps, so it's not a foolproof way of having row numbers to match with the file -- but you *can* use it for sorting, and then in a view (or subsequent table) just generate a proper row number using ROW_NUMBER(ORDER BY your_identity_column) in an analytical query.
If you keep multiple file imports in your staging table, you'd want to PARTITION BY the filename too, I guess, given that it's available.
Good! Then I think @jwarlander's approach should work! AFAIK, the file is evaluated row by row, so the identity will at least be generated in the correct order (meaning earlier rows have smaller identities). Since you will just use it to create a real rowid later, then the gaps also aren't a problem.
so I finally got around to test the IDENTITY column approach...turns out the results are a wee bit non deterministic ( or I made an error in my test setup , so any pointers appreciated 😄 ) :
-- Step 1, let´s get some test data in a sorted fashion EXPORT (SELECT LEVEL,'text with '||LEVEL,'file1.csv' FROM dual CONNECT BY LEVEL<=10000 ORDER BY level) INTO LOCAL CSV FILE 'C:\temp\file1.csv' ; -- Step 2, create a table with identity column CREATE TABLE test_import( my_ident decimal(18,0) IDENTITY, id_from_file DECIMAL(18,0), some_text varchar(2000), filename varchar(2000) ); -- Step 3, IMPORT data from Step 1) IMPORT INTO test_import(id_from_file,some_text,filename) FROM LOCAL CSV FILE 'C:\temp\file1.csv'; -- Step 4, check for violations in identity order compared to original order WITH base AS ( SELECT a.*, lead(id_from_file) OVER (ORDER BY my_ident) ld_id FROM test_import a ORDER BY my_ident ) SELECT * FROM base WHERE ld_id<id_from_file --ORDER BY 1 desc --ORDER BY my_ident; ; -- Step 5 truncate the table TRUNCATE TABLE test_import;
So far so good, the initial IMPORT did indeed produce a ( somewhat gappy, but that´s ok ) sort order which preserved the id_from_file order ( so Step 4 yields an empty result ).
Now, being the paranoid nut that I am , I tried to truncate the table and do the IMPORT again...also fine...just for good measure, once more aaaand:
Interessting... now I went about and dropped the table, created it again and found the same behaviour.....
Disconnecting my session and starting fresh got me back to "square one" , so initial IMPORT worked but following ones produces the phenomenon...
I´ll be thinking about this for a while, but as for now: not yet a solution 😀
I would expect that behavior with TRUNCATE, but if you run a create or replace, I would expect the identity value to be dropped....
You can also re-set the identity using an ALTER TABLE command:
ALTER TABLE t ALTER COLUMN id SET IDENTITY 1;
Welcome back Malte and Happy New Year.