Automated distribution and partition key managment

Exasol’s automated index management is a great time saver.
It would be great if distribution and partition key management was likewise automated.

In the case of large tables with multiple foreign keys to other large tables it can be difficult to know which distribution key would benefit the greatest number of queries. It is straightforward to determine for a single query but difficult to assess for all the various queries that join to the table.

If SQL audit log is enabled it seems feasible for the cluster to use idle CPU time to analyze query plans for, say, the last month’s worth of queries, and determine which would have benefitted most from having particular distributions or partitions. It could then recommend (or automatically apply) the keys that would benefit the greatest number of queries.

4 Likes

Thanks @peterkioko for the idea. What you are describing is quite similar to what I’ve been seeing so far in research publications, e.g. for Redshift here. They propose a join multi-graph which could be built out of audit information. It’s a quite complex (NP-complete) optimisation problem. I’ll add that to our backlog and discuss for next roadmap iterations.

1 Like