On a typical Exasol cluster, data is distributed across multiple nodes very much like a hash partition, but the use case is very different: While partitions are used for filtering (eliminate as many partitions as possible in queries), distribution is used for load balancing (spread the processed data across as many nodes as possible).
Table data is also split into columns and multiple data blocks per column, such that some features of partitioning are achieved automatically, but this is far from perfect, as blocks may still contain a wide range of rows that should (in some cases) be split into separate partitions.
Exasolution 5 introduced a powerful optimization that can be used for manual partitioning:
As an example, given the following view and statement:
create view union_all as ( select * from sales_2011 UNION ALL select * from sales_2012 UNION ALL select * from sales_2013 UNION ALL select * from sales_2014 ); select sum(sales_amount) as turnover from union_all where sales_date between date '2013-11-01' and date '2014-02-28';
The intent here is that
The mentioned optimization can take place if the following conditions are met (as of version 5.0.15):
-- slow variant => not all columns included SELECT * FROM ( SELECT a FROM T UNION ALL SELECT a FROM T ) LIMIT 9; -- fast variant => all columns in original order included SELECT * FROM ( SELECT * FROM T UNION ALL SELECT * T ) LIMIT 9;
If those conditions are met, the following optimization is possible if the outer select contains a literal filter (no subselects, no joins, etc) that can be propagated to a column of the union all view:
Properties of the union table wrapper:
Limitations of the union wrapper: