One of EXASOL’s main differentiation criteria is its unprecedented loading performance, which is typically far superior to other DWHs. In order to achieve maximum performance, some fundamentals and best practices should be taken into consideration.
EXASOL is a shared-nothing database, running on multiple nodes. Ideally, those nodes’ individual capabilities to open connections to remote sources are best utilized by a “pull” approach on data loading: EXASOLs native bulk loader (“EXAloader”) makes this easy to do. It is triggered by a straight forward SQL command and hides most complexity arising from parallel loading from the user. For the complete syntax see the EXASOL manual section 2.2.2 ‘IMPORT’ .
IMPORT INTO table_1 FROM CSV AT 'http://192.168.1.1:8080/' USER 'agent_007' IDENTIFIED BY 'secret' FILE 'tab1_part1.csv';
Some ETL tools, such. as e.g. Talend, are able to employ EXAloader. An ETL tool that uses a standard connection (ODBC / JDBC) will not be able to manage parallelism (alone for the fact that it is usually unaware of the cluster resources), results will therefore always be suboptimal.
Sources to load from are:
IMPORT INTO table_4 FROM JDBC AT 'jdbc:exa:192.168.6.11..14:8563' USER 'agent_008' IDENTIFIED BY 'secret' STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ' STATEMENT ‘SELECT * from tbl2’ TABLE customers;
In general, system tables should be queried with AUTOCOMMIT ON to avoid difficulties caused by readlocks on the objects whose metadata is queried.
EXAplus can return timings of jobs:
timing start; IMPORT FROM ...; timing stop;
Alternatively you can lookup the duration of the import runs from the system tables (EXA_DBA_AUDIT_SQL or EXA_*_SQL_LAST_DAY). This makes it easy to visualize it with a standard BI tool.
IMPORT INTO table_3 (col1, col2, col4) FROM ORA AT my_oracle USER 'agent_008' IDENTIFIED BY 'secret' STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ' ERRORS INTO error_table (CURRENT_TIMESTAMP) REJECT LIMIT 10;