Fully managed cloud. 30-day FREE full featured trial. Start Now
cancel
Showing results for 
Search instead for 
Did you mean: 

Volume of data transferred via virtual schema

ezryder
Contributor

Hi,


If I have a connection between two Exasol instances via a virtual schema and I use that connection to create a physical copy of a data table from the source instance to the destination instance, how much data in size will be transferred? I can ignore the metadata transferred as part of the process, would the answer be closer to the uncompressed volume of data (RAW_OBJECT_SIZE) or the compressed object size (MEM_OBJECT_SIZE)

I would expect the MEM_OBJECT_SIZE is the answer but would like to check sa it will impact our ingress/egress calculations.

Thanks,

Dan

6 REPLIES 6

mwellbro
Xpert

Hi @ezryder ,

I once had a similar question and the answer as I recall was, that RAW_OBJECT_SIZE is more of a "theoretical size" in order to give us an indication as to how large the data would be if exported "flat".
The actual transferred data should be close to what you see as MEM_OBJECT_SIZE since that´s the actual compressed form the data is held within Exasol and for a simple transfer there´d be no need for decompressing it.

I´ll try and see if I can back up my claim in my "lab setup" at home 😉

Cheers,
Malte

 

mwellbro
Xpert

Hi again,

so I did a bit of testing and I´m rather suprised at what seems to be happening here - for the sake of clarity I´ll walk through how I ´ve tested so that any errors on my side might be discovered.

So here´s the DDL for this case:

 

CREATE SCHEMA JUST_DATA;

CREATE TABLE JUST_DATA.t1(
  MY_PK DECIMAL(20,0)
, SOME_UNIQUE_HASH CHAR(32)
, JUST_CHARS CHAR(2000)
);

INSERT INTO t1 SELECT level,HASH_MD5(level),to_char(level) FROM dual CONNECT BY LEVEL<=100000;

INSERT INTO t1 
SELECT rownum + 100000 AS idx
,hash_md5(rownum + 100000)
,to_char(rownum + 100000)
FROM (
     SELECT 1 FROM (SELECT 1 FROM t1 CROSS JOIN (SELECT rownum FROM t1 WHERE rownum<=10000)
)
)q;

 

This was done on two single-node AWS Exa V.7.1.4 (m5large) in the same VPC/subnet (chose to do this on AWS so it would be easier to show the traffic between the EC2-instances instead of mangling my virtual box NIC adapters so one could read the output in a sensible manner ).

In any case, this ends up creating the following volume of data:

mwellbro_0-1650486467438.png

Now I ran the following command:

IMPORT INTO JUST_DATA_IN_TARGET.t1 FROM EXA AT TO_FIRST_DB TABLE JUST_DATA.t1;

So with this I´d have expected to see the ~50GB being transfered between the two instances - that´s kind of not what happend:
data.PNG

The 3 GB avg. transfer per minute amounts to the following amount of rows:
rows.PNG
If I sum up the current state and suspect that I´d have to wait 10 times the current amount ( since we have 100Mio. rows and target is roughly 1 billion ) and use this as a basis (same graph, just the sum variant for the hour):
data_sum.PNG
then we´d end up "in the ball park" of what the RAW_SIZE indicated...I still find that hard to beleve though...maybe someone from "team exa" can shed some light on this ?

Looking forward to any input.

Cheers,
Malte

littlekoi
Xpert

I've tested it quite extensively.

The best way to transfer large amount of data is parallel EXPORT / IMPORT using CSV.GZ format. Number of files should be a multiple of number of nodes. 3 to 6 files per node is a good start.

Just EXPORT it to S3 from one instance and IMPORT it from S3 to another. You may delete files immediately afterwards to reduce S3 costs to almost zero.

The amount of transferred data will be close to MEM_SIZE, or maybe even lower if you have a lot of similar rows and short integers.

---

As far as I know, all other forms of transfer are inefficient and should be avoided.

exa-StefanR
Team Exasol
Team Exasol

IMPORT and EXPORT operate on the standard SQL level API and have no special data access/transfer implementation -- so data is transferred uncompressed, in whatever representation the respective driver (JDBC / Exasol CLI) uses.

After all, compression in Exasol is a very context-sensitive thing, and there is currently no concept of sending compressed data in the hopes that the receiving end has the matching environment for decompression/conversion.

ezryder
Contributor

Thanks for the answers. This very comprehesively gives me the information I was looking for.

mwellbro
Xpert

"IMPORT and EXPORT operate on the standard SQL level API and have no special data access/transfer implementation -- so data is transferred uncompressed(...)" => please change that, would you ? @exa-StefanR 😉

Jokes aside, even with the fact that "(...)compression in Exasol is a very context-sensitive thing(...)" it should/might/could be possible to transfer the necessary "decompression info" along with the data so that the data transfer is reduced and the target EXA
could handle the "decomp/recomp".....but I guess "SQL level API" is what´s preventing this from happening....

Maybe introduce a command like REALLY_FAST_IMPORT_FROM_EXA INTO some_table AT 'cluster-goes-here' TABLE that_table ?