Handling geospatial data can be tough, especially if you are working with large, high detail polygons. If you experiencing unusual high query runtime this guide shows you how to speed up the process.
View full article
In Exasol, the data is automatically evenly distributed among each node. This distribution is random, however.  By specifying distribution keys, you can control how the data is distributed, which can lead to enormous performance improvements. 
View full article
As Exasol only supports the transaction isolation level "SERIALIZABLE", this article looks at transactions and potential transaction conflicts.
View full article
This article describes the difference between local and global joins, and how to convert them. 
View full article
This article talks about the use of UNION ALL at its benefits.
View full article
Tips on how to improve joins within Exasol.
View full article
Background Cause and Effect: Since EXASolution transaction isolation level is SERIALIZABLE and newly created transactions are automatically scheduled after finished transactions, it is possible that WAIT FOR COMMITS occur for pure read transactions (consisting of SELECT statements, only).  Explanation How to reproduce: Three different connections (having AUTOCOMMIT off) are needed to reproduce this situation: Example 1: If a long running transaction (Tr1) reads object A and writes object B (e.g. long running IMPORT statements) and a second transaction (Tr2) writes object A and commits in parallel, Tr2 is scheduled after Tr1. AfterTr2 is commited all new transactions are scheduled after it. If such a transaction wants to read object B it has to wait for the commit of Tr1. Transaction 1 Transaction 2 Transaction 3 Comment select * from tab1;       insert into tab2 values 1;       – transaction remains opened         insert into WFC.tab1 values 1;   Transaction 1 < Transaction 2   commit;         commit; Starts a new transaction (Transaction 2 < Transaction 3)     select * from tab2; This statement ends up in WAIT FOR COMMIT , waiting for Transaction 1   Example 2: The same situation may occur if you query system tables while SqlLogServer is performing one of its tasks (e.g. "DB size task" determining the database size). The following example describes this situation: Transaction 1 LogServer Transaction 3 Comment select * from EXA_DB_SIZE_LAST_DAY;       insert into tab1 values 1;       – transaction remains opened         – DB size task (writes EXA_DB_SIZE_LAST_DAY)   Transaction 1 < LogServer transaction, the task is executed every 30 minutes (0:00, 0:30, 1:00, 1:30, ...)     commit; Starts a new transaction (LogServer transaction 2 < Transaction 3)     select * from EXA_DB_SIZE_LAST_DAY; This statement end up in WAIT FOR COMMIT Solution Currently, the only solution to this is to break up Transaction 1 into multiple transactions by performing a COMMIT or ROLLBACK after the initial read access. However, things may get more complicated when the read/write operation is concentrated within a single statement (ie. MERGE or INSERT from SELECT). In the latter case it has proven helpful to 'outsource' the reading part by using IMPORT as a subselect to fetch required data through a separate transaction... Additional References https://community.exasol.com/t5/database-features/transaction-system/ta-p/1522 https://community.exasol.com/t5/database-features/filter-on-system-tables-and-transaction-conflicts/ta-p/1232
View full article
This article describes the difference between DELETE and REORGANIZE
View full article
This article explains query performance using profiling.
View full article
Exasol cluster enlargement (adding nodes) and what you need to know if you are seeing impeded performance.
View full article
Background Exasol database automatically computes and stores column statistics on demand, e.g. when joining tables for the first time. Those statistics consist of distinct estimates, minimum & maximum values, balancing, and other information. They are used by our query optimizer for estimation of join costs, filter sensitivities, etc. The computation of column statistics performs a complete column scan. Therefore it might be expensive, especially for large tables and/or tables not having all data in memory. This means that a query doing statistics computation may experience a significant slow down compared to the next execution. Column statistics are maintained as part of DML statements. They are recomputed if a significant amount of data has been changed since the last computation. Explanation Problem Description Due to format changes, all column statistics are invalidated during an update from EXASOL 5.0 to EXASOL 6.0. Required User Action   After the update to EXASOL 6.0, we recommend to recompute the statistics for the whole database to avoid any potential unexpected performance losses. Please note that the following command is introduced with version 6.0.4 (see EXASOL-2110 ). ANALYZE DATABASE REFRESH STATISTICS;   Alike other multi-table statements, ANALYZE DATABASE does an implicit COMMIT after each table minimizing transaction conflicts.   Time Estimation It is often useful to obtain an estimate on the duration of the ANALYZE DATABASE REFRESH STATISTICS statement. The query below delivers such an estimate (measured in seconds) when running before the update (that is, on EXASOL 5.0, while statistics are still valid): select cast ( zeroifnull( sum (raw_object_size) / 1024 / 1024 / 150 / nproc() ) as dec (18, 1) ) as COLUMN_STATISTICS_REFRESH_SECONDS from "$EXA_COLUMN_SIZES" where (column_schema, column_table, column_name ) in ( select column_schema, column_table, column_name from "$EXA_COLUMN_STATISTICS" where -- filter does not work on 6.0 before the REFRESH min_value_estimate is not null ); Additional References https://community.exasol.com/t5/database-features/exa-statistics/ta-p/1413
View full article
Question What is a smart table replication and how can the replication border be modified? Answer General information on smart table replication Replicates are used "on-the-fly" for local joins if a table is "small" regarding the threshold. A replicated table accesses data directly from other nodes' database memories and keeps a local copy in its own DBRAM. If a replicated table is modified, only changed data is reloaded into database memories of other nodes. Modified tables and subselects cannot be used with smart table replication. Table replication border does not apply to those.   💡 Using large replicated tables might cause decreased performance. Queries with expensive table scans (filter expressions) or between / cross joins may fall into this category. Soft replication border A table will be replicated if none of the thresholds below are reached. The table size threshold refers to the RAW_OBJECT_SIZE like in EXA_*_OBJECT_SIZES. The replication borders can be modified through   extra database parameters   in the web interface: -soft_replicationborder_in_numrows=<numrows> [ default is 100000 rows] -soft_replicationborder_in_kb=<kb> [ default is 1000000 -> 1GB]
View full article
This article shows what happens behind the scenes when JOINing on a DECIMAL datatype.
View full article
This article describes how to do data loading performance tests.
View full article
Question What is a replicated table and how can the replication border be modified? Answer Replicates are used "on-the-fly" for local joins if a table is "small" regarding the threshold. A replicated table accesses data directly from other nodes' database memories and keeps a local copy in its own DBRAM. If a replicated table is modified, only changed data is reloaded into database memories of other nodes. Modified tables and subselects cannot be used with smart table replication. Table replication border does not apply to those.   💡 Using large replicated tables might cause decreased performance. Queries with expensive table scans (filter expressions) or between / cross joins may fall into this category. Replication border A table will be joined by smart table replication if it has fewer or equal rows than the threshold below. The replication borders can be modified through   extra database parameters   in the web interface: -soft_replicationborder_in_numrows=<numrows> [ default is 100000 rows]
View full article