20-06-2020 07:16 PM
Any other quick way to creating copy of huge dataset except 'Create or replace table' or Insert into - for keeping original dataset intact and using second one for experimenting
20-06-2020 09:49 PM
hi again 😉
you could go for an IMPORT within the same database - I´m not sure if you get any more "mileage" compared to CREATE OR REPLACE - I´ll test that on occasion
21-06-2020 06:19 AM
Thanks for the input! If I understand correctly, import could be a possibility to replicate a table faster. Can this be faster if it can be made to execute parallelly across nodes. If yes, how to force import to run parallely.
21-06-2020 02:23 PM
I´d assume that IMPORT should work in parallel automatically:
COMMAND_NAME | IPROC | PART_NAME | OBJECT_NAME | OBJECT_ROWS | IN_ROWS | OUT_ROWS | DURATION |
---|---|---|---|---|---|---|---|
IMPORT | 0 | IMPORT | T3 | 0 | 6250000 | 6250000 | 5.215547 |
IMPORT | 1 | IMPORT | T3 | 0 | 6250000 | 6250000 | 5.215421 |
IMPORT | 2 | IMPORT | T3 | 0 | 6250000 | 6250000 | 5.215292 |
IMPORT | 3 | IMPORT | T3 | 0 | 6250000 | 6250000 | 5.215446 |
IMPORT | 4 | IMPORT | T3 | 0 | 6250000 | 6250000 | 5.215514 |
IMPORT | 5 | IMPORT | T3 | 0 | 6250000 | 6250000 | 5.215219 |
IMPORT | 6 | IMPORT | T3 | 0 | 6250000 | 6250000 | 5.215208 |
IMPORT | 7 | IMPORT | T3 | 0 | 6250000 | 6250000 | 5.215290 |
What surprised me was that IMPORT did perform slower than CTAS on a 50M copy...
Creating testset:
COMMIT | 1.208 | 0 | COMMIT /* AUTO */ |
CREATE TABLE AS | 1312.431 | 50000000 | create table t1 as select level as my_pk,HASH_MD5(to_char(level)) as some_text,current_date + mod(level,100000) as some_date,'fix text for good measure' as some_fix_text from dual connect by level<=50000000; |
fyi: not the most ideal/performant why to create the data but I wanted to use a sql that can by copy&pasted&executed
Copy with CTAS:
COMMIT | 1.845 | 0 | COMMIT /* AUTO */ |
CREATE TABLE AS | 1.013 | 50000000 | create table t2 as select * From t1; |
Copy with IMPORT:
COMMIT | 1.318 | 0 | COMMIT /* AUTO */ |
IMPORT | 5.256 | 50000000 | import into t3 from exa at ... table t1; |
Resulting data:
SIZE_GB | TABLE_ROW_COUNT | OBJECT_NAME | OBJECT_TYPE | RAW_OBJECT_SIZE | MEM_OBJECT_SIZE |
---|---|---|---|---|---|
1.737127877771854 | 50000000 | T1 | TABLE | 3400000000 | 1865226856 |
1.737139143049717 | 50000000 | T3 | TABLE | 3400000000 | 1865238952 |
1.737249948084354 | 50000000 | T2 | TABLE | 3400000000 | 1865357928 |
I´d assume that on a table with a bigger size and more columns IMPORT should outperform CTAS ( which btw also works as parallel as your cluster can muster i.e. #nodes) ..
Depending on what "for experimenting" means in your case you might save a bit if you can manage to stay in the same session for copy&experimenting and not commit your copy ( i.e. CTAS or IMPORT with autocommit off and all tests in the same transaction ) - of course, that will take a toll on your DBRAM and is not viable if your tests can´t be made in the same session.
22-06-2020 11:06 AM
> I´d assume that IMPORT should work in parallel automatically.
If you import from EXA as data source type, yes.
> What surprised me was that IMPORT did perform slower than CTAS on a 50M copy...
Of course CTAS/INSERT also work in parallel and can do without any driver data conversion overhead...
The only situation where IMPORT used to be (?) faster than INSERT was when the target table had a distribution key differing from the source table; with IMPORT the distribution was done in-stream, while the INSERT performed a two-phase approach.
All in all, CTAS should be the fastest possible way to copy a data set. If the data set does not fully fit into memory, maybe a loop of smaller INSERT statements is preferable.
21-06-2020 05:16 PM
Thanks for the details!
Tried import but execution duration are almost same compared to current implementation. Seems CTAS/Insert into/Import have similar performance to replicate table on same server.
Then it's time to become part of a unique family! Discover helpful tips and support other Community members with your knowledge.
Sign In