Problem When switching from EXAplus, you would like to change the shortcut for executing the current statement to do Auto completion create LUA Scripts / UDFs  use of Favorites Solution Change keyboard shortcuts When switching from EXAplus, the first thing that you would like to change is the shortcut for executing the current statement. In DBVisualizer this is set to   Ctrl + .  per default.  In EXAplus on the other hand this shortcut is defined as Ctrl + Enter. To chance it go to Tools -> Tool Properties and open the Key Bindings Dialog in the tree, create a copy of the current keymap called exasol and change the Binding for Main menu -> SQL Commander -> Execute Current:   Auto completion  It's important to note that in DbVisualizer the auto completion of table/column names works a little bit different than in EXAplus. In EXAplus you had to type first the schema name to access the tables. In DBVisualizer on the other hand you can directly access the table or column name without the need to fully qualify the object. It's also important to note that in DBVisualizer if you are in the context of a schema only the tables of that schema are shown in the auto completion. You can also customize the behavior of the auto-completion in Tools -> Tool Properties in the Dialog SQL Commander -> Auto Completion:   Creation of LUA Scripts / UDFs  In DBVisualizer LUA scripts / UDFs should be created as a SQL block. The start of the SQL Block is   --/   and the end of the block is defined with   /: --/ CREATE LUA SCRIPT MY_CAT RETURNS TABLE AS return query([[select * from cat]]) /   Per default   keywords in DBVisualizer are displayed in uppercase, so if you open an existing LUA script keywords like are automatically transformed to uppercase, so the case sensitive LUA script is not working anymore, please change this setting in Tool Properties -> Tools - SQL Formatting to "Keep Case Asis"   Please also have a look at the official documentation http://confluence.dbvis.com/display/UG100/Executing+Complex+Statements Favorites versus Bookmarks In EXAplus frequently used SQL Statements have been stored in the favorite tab. In DBVisualizer you have a similiar feature called Bookmarks. Learn more in the official documentation:   http://confluence.dbvis.com/display/UG100/Managing+Frequently+Used+SQL . EXAplus commands EXAplus commands like   set autocommit off  are not working in DBVisualizer. DBVisualizer has it's syntax for it's client commands starting with @, e.g.   @set autocommit off;   See also official documentation: **    http://confluence.dbvis.com/display/UG100/Using+Client-Side+Commands  Additional Notes Known Issues of DBVisualizer  The open schema call is currently not interpreted by DBVisualizer Newly created schemas are not shown in the SQL Editor drop down menu -> Workaround reconnect the connection R Scripts sent from a Windows OS using DBVisualizer can't be executed because of windows CRLF, Following Workaround  In the current version, DbVisualizer uses the platforms linefeed convention, i.e. CRLF for Windows and LF for all other platforms. There is a way to force DbVisualizer to use LF for the text it sends to the database regardless of platform:   1) Open the file named   DBVIS-HOME/resources/dbvis-custom.prefs   (where DBVIS-HOME is the DbVisualizer installation folder) in a text editor, 2) Add the following on a new row:   dbvis.exasol.ReplaceCRLFwithLF=true 3) Restart DbVisualizer. 
View full article
Background Geospatial data   can be stored and analyzed in the Exasol database using the GEOMETRY datatype. In this solution, we will show you some examples of how to import geo spatial data from a CSV and from a GeoJSON file and use SQL functions to perform analytics and geo joins. Spacial Reference Systems In the system table SYS.EXA_SPATIAL_REF_SYS, you will find more than 3000 different spatial reference systems which can be used for the GEOMETRY datatype. These reference systems are there to define points on earth, but they have different strengths, accuracies, and properties, for example, the SRID 31466 can only reference locations within Germany. We will use the two SRIDs 4326 and 2163 in our example. 4326 is using degrees as a unit, 2163 uses meters. Often, conversions are necessary between different coordinate systems. The ST_TRANSFORM function can be used for this. Prerequisites Working with Spatial Data in Exasol The following example shows how to create a table with a geometry column, how to insert and query data. We are using GEOMETRY(4326) because the provided coordinates are in degrees. Later, in the SELECT query, we transform the geodata into SRID 2163 to see the distance between the two cities in meters: CREATE TABLE cities( name VARCHAR (200), geo GEOMETRY(4326)); INSERT INTO cities VALUES ( 'Berlin' , 'POINT (13.36963 52.52493)' ); INSERT INTO cities VALUES ( 'London' , 'POINT (-0.1233 51.5309)' ); -- this shows the distance in degrees: SELECT a . name , b. name , st_distance( a .geo, b.geo) FROM cities a , cities b; -- this shows the distance in meters: SELECT a . name , b. name , st_distance(ST_Transform( a .geo, 2163), ST_Transform(b.geo, 2163)) FROM cities a , cities b; GEOMETRY columns can be filled with strings using the well-known text representation (WKT), e.g. 'POINT (13.36963 52.52493)' How to process geodata Step 1: Importing geodata Importing geodata from CSV files Often, geodata is present in CSV files or in colunms of tables that are imported from different database systems in form of latitude and longitude values. On   https://openflights.org/data.html#airport, you can download a CSV file containing international airport data. We are using the extended version of this file, which consists of more than 12,000 international airports and train stations. In the first two fields of the file, there is the id and the name of the airport; in columns 7 and 8, we find its latitude and longitude coordinates: CREATE OR REPLACE TABLE airports( airport_id INT , name VARCHAR (500), latitude DECIMAL (9,6), longitude DECIMAL (9,6) ); IMPORT INTO airports FROM LOCAL CSV FILE 'D:\airports-extended.dat' (1, 2, 7, 8); ALTER TABLE airports ADD COLUMN geo GEOMETRY(4326); UPDATE airports SET geo = 'POINT (' ||longitude|| ' ' ||latitude|| ')' ; SELECT * FROM airports; We firstly used DECIMAL(9,6) columns to store the latitude and longitude values, and then we added a GEOMETRY column to store a geodata point for each airport. Similar to the example above, we can now calculate the distance between two airports or train stations in our table using the ST_DISTANCE and the ST_TRANSFORM function: select st_distance(ST_Transform( a .geo, 2163), ST_Transform(b.geo, 2163)) from airports a , airports b where a . name = 'Berlin-Tegel Airport' and b. name = 'Berlin Hauptbahnhof' ; Importing data from GeoJSON GeoJSON   is an often-used format for storing and exchanging geodata. On   https://geojson-maps.ash.ms, you can download countries as geodata. When you generate a custom JSON file on this site, the file consists of one JSON object, a so-called FeatureCollection, which contains multiple polygons and multi-polygons, one for each country. We import the data in three steps: Loading the full GeoJSON string into a VARCHAR(2000000) column in a table. (mind that this does not work if the GeoJSON is larger than two million characters) Parsing the GeoJSON using the json_table UDF script (see querying-and-converting-json-data-with-the-json-table-udf; this UDF script emits a table with one row for each geo-object) Converting each GeoJSON object into a GEOMETRY value using the ST_geomFromGeoJSON UDF script (attached to this solution) After these steps, we have all countries from the GeoJSON file in a GEOMETRY column. An alternative approach is developing a UDF script that loads the GeoJSON from a server (e.g., by using the Python package requests or paramiko) and parsing and iterating over the GeoJSON objects within the script. For each object, a WKT string is emitted, which can later be converted into a GEOMETRY value. This way, there are no limit on the characters. For our approach, the whole GeoJSON must not be larger than 2,000,000 characters: create or replace table geo_import(v varchar (2000000)); import into geo_import from local csv file 'D:\custom.geo.json' column separator = '0x01' column delimiter = '0x02' ; -- dummy separaters / delimiters to import a whole line as one column value -- json_table (can be found in https://community.exasol.com/t5/database-features/querying-and-converting-json-data-with-the-json-table-udf/ta-p/1800) emits a row for each country with two columns name and geojson create or replace view geojson as select json_table(v, '$.features[*].properties. name ' , '$.features[*].geometry' ) emits ( name varchar (2000000), geojson varchar (2000000)) from geo_import; -- ST_GeomFromGeoJSON is attached to https://community.exasol.com/t5/database-features/how-to-import-the-whole-world-into-exasol-geospatial-data-geo/ta-p/1669 create or replace table countries as select name , cast (ST_GeomFromGeoJSON(geojson) as geometry(4326)) as geo from geojson; select * from countries; Step 2: Geo-Joins and Geo-Indexes The following query performs a geo-join between the two tables, countries and airports. It finds all airports that are within the country Italy: select a .* from countries c join airports a on st_contains( c .geo, a .geo) where c . name = 'Italy' ; You can use   profiling   to see that an geo-index is created automatically and that this geo-index is used to execute the geo-join: alter session set profile = ' ON ' ; alter session set query_cache = ' OFF ' ; select a .* from countries c join airports a on st_contains( c .geo, a .geo) where c . name = 'Italy' ; flush statistics ; select * from exa_statistics.exa_user_profile_last_day where session_id = current_session and command_name = ' SELECT ' preferring high stmt_id; Like other indexes in Exasol, the geo-index is persisted so that it can be used for future queries, it is maintained automatically when the table data changes, and it is automatically dropped when it is not used for five weeks or longer. Mind that geo-indexes are a new feature since Exasol 6.1. Step 3: Visualizing Geometry Data The SQL client DBeaver is able to show geospatial data on a map. On a result tab, right click on a GEOMETRY column, choose View / Format, View as Geometry. Then click on the Panels button on the right to show the Value panel with the map.        
View full article
This article explains query performance using profiling.
View full article
This article is about  using the IMPORT command over JDBC to connect to your Exasol database and wrap the LUA-scripts you want to execute in parallel in the STATEMENT-Clause.
View full article
Background By default, an Exasol virtual machine (VM) is configured to use a host-only network. This configuration allows to access the database and the EXAoperation management interface locally from the host machine. Nevertheless, this configuration prevents the use of publically available hosts and services on the internet from the virtual machine. This How-To provides information about configuring Exasol to be able to access the internet and enables users to: use DNS to access publicly reachable servers for making backups or in database IMPORT/EXPORT statements use LDAP servers of your choice for database or EXAoperation accounts use standard repositories for the installation of UDF packages use the Exasol Version Check (only if this feature has not been disabled) Prerequisites If not already done, import the Exasol OVA file into the VM tool of your choice (in Virtualbox: File -> Import Appliance). Accept the "End User License Agreement" and the "Privacy Statement" and wait until the image has been successfully loaded. How to enable internet access for Exasol Community Edition Configuration of VM network Step 1 Now, a new VM has been created. Change its network settings to use NAT (in Virtualbox: right click on VM -> Settings -> Network -> Adapter 1 -> Attached to NAT ) Step 2 Define port forwardings to guest ports 8563 (database) and 443 (EXAoperation management interface) (in Virtualbox: Adapter 1 -> Port Forwarding -> Add rule -> Host Port 8563 and Guest Port 8563 -> Add rule -> Host Port 4443 and Guest Port 443). The port forwarding rules will enable you to use the VM from the physical host machine. Step 3 Start the VM and wait until Exasol is up and running.  Configuration of DNS server Step 1 Browse to EXAoperation and login. Step 2 Go to Network. In the System tab, click on "Edit". Step 3 Add one or two DNS servers reachable from the VM (e.g. "8.8.8.8" for an environment that can reach internet DNS servers) and click "Apply". Step 4 Log out. Additional Notes Troubleshooting Firewalls and/or company proxy servers may block or restrict traffic to internet hosts.
View full article
This article talks about analytic functions for windowing with links to additional analytics.
View full article
How to analyze disk I/O and concurrency using the *_last_day table data
View full article
This article describes what you can do if you cannot contact the LDAP server during user creation
View full article
Background Cause and Effect: Since EXASolution transaction isolation level is SERIALIZABLE and newly created transactions are automatically scheduled after finished transactions, it is possible that WAIT FOR COMMITS occur for pure read transactions (consisting of SELECT statements, only).  Explanation How to reproduce: Three different connections (having AUTOCOMMIT off) are needed to reproduce this situation: Example 1: If a long running transaction (Tr1) reads object A and writes object B (e.g. long running IMPORT statements) and a second transaction (Tr2) writes object A and commits in parallel, Tr2 is scheduled after Tr1. AfterTr2 is commited all new transactions are scheduled after it. If such a transaction wants to read object B it has to wait for the commit of Tr1. Transaction 1 Transaction 2 Transaction 3 Comment select * from tab1;       insert into tab2 values 1;       – transaction remains opened         insert into WFC.tab1 values 1;   Transaction 1 < Transaction 2   commit;         commit; Starts a new transaction (Transaction 2 < Transaction 3)     select * from tab2; This statement ends up in WAIT FOR COMMIT , waiting for Transaction 1   Example 2: The same situation may occur if you query system tables while SqlLogServer is performing one of its tasks (e.g. "DB size task" determining the database size). The following example describes this situation: Transaction 1 LogServer Transaction 3 Comment select * from EXA_DB_SIZE_LAST_DAY;       insert into tab1 values 1;       – transaction remains opened         – DB size task (writes EXA_DB_SIZE_LAST_DAY)   Transaction 1 < LogServer transaction, the task is executed every 30 minutes (0:00, 0:30, 1:00, 1:30, ...)     commit; Starts a new transaction (LogServer transaction 2 < Transaction 3)     select * from EXA_DB_SIZE_LAST_DAY; This statement end up in WAIT FOR COMMIT Solution Currently, the only solution to this is to break up Transaction 1 into multiple transactions by performing a COMMIT or ROLLBACK after the initial read access. However, things may get more complicated when the read/write operation is concentrated within a single statement (ie. MERGE or INSERT from SELECT). In the latter case it has proven helpful to 'outsource' the reading part by using IMPORT as a subselect to fetch required data through a separate transaction... Additional References https://community.exasol.com/t5/database-features/transaction-system/ta-p/1522 https://community.exasol.com/t5/database-features/filter-on-system-tables-and-transaction-conflicts/ta-p/1232
View full article
This article shows what happens behind the scenes when JOINing on a DECIMAL datatype.
View full article
This article includes a script to help you generate DDLs for one table
View full article
Question Is the IDENTITY Column guaranteed to be unique? Answer This ticket deals with IDENTITY columns and if they are guaranteed unique. Even though your use case scenario may not apply here, the uniqueness is NOT guaranteed as you can UPDATE the IDENTITY column. The way around this is to make the IDENTITY column a primary key. Probably not doable for most production database repositories. We discussed the content in   https://docs.exasol.com/sql_references/data_types/identitycolumns.htm specifically, the part, "   Identity columns cannot be considered as a constraint, that is, identity columns do not guarantee unique values. However, the values are unique as long as they are   inserted implicitly   and are not manually changed." This means a series of concurrent inserts where the IDENTITY is not included in the values inserted, then the FIFO rule applies, where the first insert does a commit and generates the IDENTITY, then the next insert is free to insert. The emphasis is on the   insert implicitly   - where the IDENTITY value is not specified, but system generated. If you do an insert with the IDENTITY value coded in the insert statement if will insert and become a duplicate. Remember, the IDENTITY column can't be considered a constraint. There are a couple of suggestions to get closer to guaranteed uniqueness: 1. LOCK the table you are inserting into - which means no one can insert until you either commit or rollback. This can be achieved with: DELETE FROM <tableName> where FALSE; 2. Do the insert 3. Release lock with commit or rollback. Another suggestion is to create a function to insert the max(ROW_NUMBER()) + 1 into the IDENTITY column. It's at the bottom of this simple example SQL. drop schema exa_29444 CASCADE; commit; create schema exa_29444; open schema exa_29444; – -- Build a test table with an IDENTITY column. – create or replace table identity_test (id int identity, name varchar(20)); – -- optional TRUE constraint for IDENTITY COLUMN – --alter table identity_test add constraint it_pk PRIMARY KEY ("ID") enable; – -- Load the table with data using   IMPLICIT insert   on IDENTITY Column "ID" – insert into identity_test (name) values ('Zach'),('Cole'),('Daniel'); commit; – -- Present the results – SELECT * FROM IDENTITY_TEST; – -- Template to generate the next sequential ID - using ROW_NUMBER analytic function. – If you wish to Generate the IDENTITY COLUMN value for the insert (Explicit insert) – WITH ROWZ (RowNumber) AS (select ROW_NUMBER() over (ORDER BY ID ASC) RowNumber FROM IDENTITY_TEST) SELECT MAX(RowNumber) +1 from ROWZ; commit; Additional References  https://docs.exasol.com/sql_references/data_types/identitycolumns.htm https://docs.exasol.com/sql/alter_table(column).htm?Highlight=identity https://docs.exasol.com/sql_references/metadata/metadata_system_tables.htm?Highlight=identity
View full article
This article describes the difference between DELETE and REORGANIZE
View full article
This article addresses ODBC and special characters.
View full article
This article shows you how you can create functions to work with the Bing Maps Tile System
View full article
This article describes how transaction conflicts occur with complex read/write scenarios, and how you can solve them
View full article
Background Exasol's UDF scripts make it possible to run machine-learning algorithms directly in the database. The training of a random-forest model and using this model to make predictions on a dataset in the Exasol database can be done in three steps: During a learning phase, the model is created based on a training dataset (e.g. from a table within the Exasol database) The model is put into BucketFS to make it accessible within UDF scripts A UDF script is used to make predictions. Prerequisites As an example, we create two tables RF.TRAIN and RF.TEST in the Exasol database and import the data from the files train.csv and test.csv. They both contain information about thousands of wines, e.g. the amount of sugar, the PH value, the acidity, and so on. The training dataset contains the taste of the wine (bad / normal / good). The goal is to train a model and classify the unknown wines in the test dataset. The SQL commands to create the table and the two CSV files are attached to this solution. We need an Exasol-Version 6.2 for the Python3 example. How to do Machine Learning in R and Python? Step 1: Learning Phase 1a. Learning Phase in R   In R, a random-forest model can be created by firstly reading the data from the Exasol database using the Exasol R SDK. It can be directly installed from the GitHub repository:   install.packages( 'randomForest' ) install.packages( 'devtools' ) devtools::install_github( "EXASOL/r-exasol" ) library(exasol) library(randomForest) con <- dbConnect( "exa" , exahost = "localhost:8563" , uid = "sys" , pwd = "exasol" ) train <- dbGetQuery(con, "SELECT * FROM RF.TRAIN" ) The dataframe  train }}contains the data from the Exasol table {{RF.TRAIN .   head(train)  shows the first rows in this table,  barplot(table(train$TASTE))  shows the taste distribution as a bar chart, e.g. in R Studio. Next, the random forest model can be created and stored into a file: rf <- randomForest(as.factor(TASTE) ~ ., data = train, ntree=1000) saveRDS(rf, "D:/rf.rds" , version=2) 1b. Learning Phase in Python 3 Since Exasol Version 6.2, the language Python 3 is included with many built-in packages like pandas, scikit-learn and many more. On a local Python 3 environment, the random-forest model can be built analogously to the example above in R. First, the training data is loaded from the Exasol database as a dataframe using the Python package pyexasol: import pyexasol import pandas as pd import numpy as np import pickle from sklearn.ensemble import RandomForestClassifier C = pyexasol.connect(dsn= 'localhost:8563' , user= 'sys' , password= 'exasol' ) train = C.export_to_pandas( "SELECT * FROM RF.TRAIN" ) The pandas dataframe train can be inspected with   train.head(5)   or   train.describe() . Next, the labels (bad/normal/good) from the taste column need to be converted into integers (-1/0/1): train.TASTE[train.TASTE == 'bad' ] = -1 train.TASTE[train.TASTE == 'normal' ] = 0 train.TASTE[train.TASTE == 'good' ] = 1 labels = np.array(train[ 'TASTE' ]) labels = labels.astype( ' int ' ) features = train.drop( 'TASTE' , axis=1) The array   labels   just contains the numeric labels for a all wines, e.g.   [1, 0, 0, -1, 1, 1, ...] .   features   looks like the original   train   dataframe but it does not contain the taste column. Next, the random-forest model can be trained and written into a file: clf = RandomForestClassifier(n_estimators=100, max_depth=2) clf.fit(features, labels) pickle.dump(clf, open( 'D:/clf.dat' , 'wb' )) Step 2: Putting the model into BucketFS The easiest way to work with Exasol's BucketFS is using the BucketFS explorer. After logging in with the Exaoperation credentials, a BucketFS service and a bucket can be created. In this new bucket, we move the model file (rf.rds for R, clf.dat for Python 3). Step 3: Creating a UDF script for prediction 3a. Creating a R UDF script for prediction When the bucket is not publicly accessible but read-protected, a connection object has to be created in the Exasol database to provide the read password to UDF scripts: CREATE CONNECTION my_bucket_access TO 'bucketfs:bucketfs1/udf' IDENTIFIED BY '12345' ; In an R UDF script, the random-forest model is read from BucketFS. After that, the input set of the UDF script is converted into a dataframe which is accepted by the predict function of the randomforest package. We load batches of 1000 rows each to make 1000 predictions in each iteration. --/ create or replace r set script rf.predict(...) emits (wine_id INT , taste VARCHAR (6)) as library(randomForest) run <- function (ctx) { rf <- readRDS( "/buckets/bucketfs1/udf/rf.rds" ) ## load the first batch of 1000 rows in the input set ctx$next_row(1000) repeat { wine_ids <- ctx[[1]]() ## create a dataframe from all input columns ( except the first one (wine_id)) numCols <- exa$meta$input_column_count df <- data .frame(ctx[[2]]()) for (i in 3:numCols) { df <- cbind(df,ctx[[i]]()) } colnames(df) <- c ( "FIXED_ACIDITY" , "VOLATILE_ACIDITY" , "CITRIC_ACID" , "RESIDUAL_SUGAR" , "CHLORIDES" , "FREE_SULFUR_DIOXIDE" , "TOTAL_SULFUR_DIOXIDE" , "DENSITY" , "PH" , "SULPHATES" , "ALCOHOL" ) prediction <- predict(rf, newdata=df) ctx$emit(wine_ids, as . character (prediction)) ## load the next batch of 1000 rows in the input set if (!(ctx$next_row(1000))){ break } } } / select rf.predict(wine_id, fixed_acidity, volatile_acidity, citric_acid, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, pH, sulphates, alcohol) from RF.test group by iproc(); Due to the   GROUP BY iproc() , the UDF script is called once per Exasol node with the data that is locally stored on that node to enable parallel and fast predictions. 3b. Creating a Python UDF script for prediction Using the scripting language PYTHON3 - which is built-in in Exasol since version 6.2 -, the input set can be accessed batch-wise as a pandas dataframe with   ctx.get_dataframe . --/ CREATE OR REPLACE PYTHON3 SET SCRIPT test.predict_wine_py(...) emits (wine_id INT , taste VARCHAR (6)) as import pickle import pandas as pdclf = pickle. load ( open ( '/buckets/bucketfs1/udf/clf.dat' , 'rb' )) clf.n_jobs = 1def run(ctx): BATCH_ROWS = 1000 while True : df = ctx.get_dataframe(num_rows=BATCH_ROWS) if df is None : break wine_ids = df[ '0' ] features = df. drop ( '0' , axis=1) res_df = pd.DataFrame( columns =[ 'WINE_ID' , 'TASTE' ]) res_df[ 'WINE_ID' ] = wine_ids res_df[ 'TASTE' ] = clf.predict(features) res_df.TASTE[res_df.TASTE == -1] = 'bad' res_df.TASTE[res_df.TASTE == 0] = 'normal' res_df.TASTE[res_df.TASTE == 1] = 'good' ctx.emit(res_df) / select rf.predict_wine_py(wine_id, fixed_acidity, volatile_acidity, citric_acid, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, pH, sulphates, alcohol) from RF.test group by iproc();
View full article
This article shows you how to send an email from a Lua Script
View full article
Background This solution is currently work in progress, but published anyway       Aggregation with KEEP FIRST/LAST workaround is given in this solution. We will try to add a workaround for analytical functions with KEEP FIRST/LAST All Joins in our examples are not taking care about potential NULL values. For NULL values a join condition using NVL or similar should be used No warranty for correctness How to transform the following select to something supported in EXASOL?       Note: The following transformation is an advanced topic. Understanding how to rewrite the query will need a considerable amount of time. SELECT <aggregate_function>(<aggregate_args>) KEEP ( DENSE_RANK FIRST / LAST ORDER BY <order_expr>) <other_select_list_exprs> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols> HAVING <having_clause> DENSE_RANK KEEP FIRST/LAST calculate the <aggregate_function> only on those values of <aggregate_args> that have the highest / lowest values in <order_expr>. In order to rewrite the statement we need to calculate: Step 1 <other_select_list_exprs> with the original GROUP BY Step 2 <aggregate_function>(<aggregate_args>) for a GROUP BY <group-by_cols>,<order_expr>. This computes the aggregates for each value of order_expr, so among others - which will be discarded - there is the computation for the FIRST and LAST value Step 3 The minimum/maximum value of <order_expr> for the original GROUP BY. This is needed to determine which subgroup to select from the finer aggregates. The values for the original select from (1) will then be enriched with values from (2) by a join on the GROUP BY-columns of the original select together with the attribute for selecting exactly the <aggregate_function> value for the subaggregation for the FIRST / LAST value. So basically, the rewriting looks like this: WITH original_aggregation AS ( SELECT <other_select_list_exprs>, min / max (<order_expr>) AS first_last_selector, <group_by_cols> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols> HAVING <having_clause> ), subaggregate_for_first_last AS ( SELECT <aggregate_function>(<aggregate_args>) AS afk, <group_by_cols>, <order_expr> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols>, <order_expr> HAVING <having_clause> -- not needed and not accurate, but filters anyway ) -- Now the join SELECT sub_agg.afk, orig_agg.<other_select_list_expr> FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.<group_by_cols> = sub_agg.<group_by_cols> AND orig_agg.first_last_selector = sub_agg.<order_expr> To get the computation faster, we don't use a separate select for 1) and 2), but combine the calculation in a single select with different GROUPING SETs and distinguish them by the different values of the GROUPING-function (see manual for more details): WITH all_aggregations AS ( SELECT <aggregate_function>(<aggregate_args>) AS afk, <other_select_list_exprs>, min / max (<order_expr>) AS first_last_selector, <group_by_cols>, <order_expr>, GROUPING (<group_by_cols>, <order_expr>) AS group_selector FROM <from_clause> WHERE <where_clause> GROUP BY GROUPING SETS ((<group_by_cols>), (<group_by_cols>, <order_expr>)) HAVING <having_clause> ), original_aggregation AS ( SELECT <other_select_list_exprs>, first_last_selector, <group_by_cols>, FROM all_aggregations WHERE group_selector=< GROUPING value for aggregate on <group_by_cols> only > ), subaggregate_for_first_last AS ( SELECT afk, <group_by_cols>, <order_expr> FROM all_aggregations WHERE group_selector=< GROUPING value for aggregate on <group_by_cols> and <order_expr>> ) -- Now the join SELECT sub_agg.afk, orig_agg.<other_select_list_expr> FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.<group_by_cols> = sub_agg.<group_by_cols> AND orig_agg.first_last_selector = sub_agg.<order_expr> Example of a simple transformation -- Sample data table CREATE TABLE T (i int , gc int , j int , k int ); INSERT INTO T VALUES (1, 1, 3, 1), (2, 1, 5, 1), (3, 1, 7, 2), (4, 1, 9, 3), (5, 1, 11, 3), (6, 2, 7, 1), (7, 2, 3, 2); -- This select SELECT max (i) AS col, sum (j) KEEP ( DENSE_RANK FIRST ORDER BY k ) kdrf FROM t GROUP BY gc; -- With result -- col kdrf -- 5 8 (=3+5, values 7, 9 and 11 are not aggregated) -- 7 7 (=7, value 3 is not aggregated) -- Is transformed to WITH all_aggregations AS ( SELECT sum (j) AS afk, max (i) as col, min ( k ) AS first_last_selector, gc, k , GROUPING (gc, k ) AS group_selector FROM t GROUP BY GROUPING SETS ((gc), (gc, k )) ), original_aggregation AS ( SELECT col, first_last_selector, gc FROM all_aggregations WHERE group_selector=1 ), subaggregate_for_first_last AS ( SELECT afk, gc, k FROM all_aggregations WHERE group_selector=0 ) -- Now the join SELECT orig_agg.col, sub_agg.afk as kdrf FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.gc = sub_agg.gc AND orig_agg.first_last_selector = sub_agg. k ; Additional Notes What to do with aggregated select (no GROUP BY) The case that KEEP DENSE RANK FIRST/LAST is used in an aggregated select without GROUP BY is a simpler special case of the case with GROUP BY. In that case we add a   GROUP BY GROUPING SETS (), (<order_expr>)   and join on only on   first_last_selector=order_expr . Essentially everything is the same in that case, only we can omit   <group_by_cols>   whereever they were used. Transformation for multiple FIRST/LAST functions Note that in the case of a single KEEP DENSE_RANK FIRST/LAST function the value of the group_selector in the WITH-clause subaggregate_for_first_last is 0. However, we can do the same transformation for multiple KEEP DENSE_RANK FIRST/LAST functions. In the general case we need a separate subaggregation for each KEEP DENSE_RANK FIRST/LAST function - which means another GROUPING SET (<group_by_cols>, <new_order_expr>), another WITH-clause with selection of the correct GROUPING and another LEFT JOIN on <group_by_cols> and <first_last_selector> for that <new_order_expr>. A few tweaks can be applied to the general case in some situations: if multiple KEEP DENSE_RANK FIRST/LAST functions share the same <order_expr> their <aggregate_function>-values can be computed in the same subaggregate_select (and same GROUPING SET), but might still need a sepate LEFT JOIN for each if they further have the same value of FIRST/LAST and have the same ORDER BY direction a single LEFT JOIN on this select is enough to get all the correct values. Example of a transformation with several FIRST/LAST Matrix for first_last_selector The following expressions need to be used as first_last_selector depending on FIRST/LAST and ORDER BY clause: ORDER BY FIRST LAST ASC   [NULLS LAST] min(<order_expr>) CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE max(<order_expr>) END DESC   [NULLS FIRST] CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE max(<order_expr>) END min(<order_expr>) ASC NULLS FIRST CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE min(<order_expr>) END max(<order_expr>) DESC NULLS LAST max(<order_expr>) CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE min(<order_expr>) END Multiple <order_by_expr>s If there are multiple <order_expr>s for a single KEEP DENSE_RANK FIRST/LAST function things get more complicated: In that case we can try to combine them all in a single value by concatenating them (and padding, such that order of the values is the same as for the concatenated values. This will work as long as all columns have the same ORDER BY direction Otherwise we might need to add a subselect around the <from_clause> containing <from>.* and RANK() OVER (ORDER BY <order_by_expr>s) as rnk. Then we can use MIN(CASE WHEN rnk=1 THEN <order_by_expr> ELSE NULL) for each <order_by_expr> to get multiple first_last_selectors for the LEFT JOIN. Of course, this would need to be done for each KEEP DENSE_RANK FIRST/LAST function with multiple <order_by_expr>s. Additional References https://www.exasol.com/support/browse/IDEA-65
View full article
Problem We want to replace all NULL values within a certain table with the default value specified for the according column. And we don't want to create the necessary update statements manually. Solution A small metadata-driven procedure script (Lua) that issues the required update statements. (See Attachment) Notes... ...on transactions The script performs a rollback after the metadata request, to avoid a read-write conflict scenario. performs all updates within a single transaction. will   not abort   when an update on one of the columns fails. performs a commit when all columns have been handled, regardless of any errors encountered. ...on column selection The script   includes   all columns that do have a DEFAULT value set. It   excludes   all columns with a NOT NULL constraint (ignoring the actual state of the constraint). Obviously, such a column can not contain any NULL values that need updating. ...on row selection Due to Exasol's memory management and data processing, the script handles each column separately. This minimizes both the amount of memory required for processing and the amount of data blocks being written. The script does   not   contain any delta functionality, it will process all rows of the table each time it is called. Installation Just create the script in any schema you like (CREATE SCRIPT permission required). It does not have any dependencies. create or replace /* procedure */ script REPLACE_NULL( schema_name , table_name ) ... Usage When calling the script, it expects two parameters: A schema name and a table name: execute script REPLACE_NULL( 'my schema ' , 'my table ' );   💡 Both schema and table name are expected as   string   and will be case-sensitive. Example open schema SR9000; -- Rows affected: 0 create table Invoice( invoice_id int , invoice_date date default date '2017-01-01' ); -- Rows affected: 0 insert into Invoice values (1, null ), (2, null ), (3, '2017-02-01' ); -- Rows affected: 3 execute script REPLACE_NULL( 'SR9000' , 'Invoice' ); -- [43000] " No columns found for " SR9000 "." Invoice "" caught in script " SR9000 "." REPLACE_NULL" at line 23 ( Session : 1585944483210400591) ... yes. We created the table using a regular identifier, so it ended up as uppercase... execute script REPLACE_NULL( 'SR9000' , 'INVOICE' ); This returns: COLUMN_NAME COLUMN_TYPE COLUMN_DEFAULT UPDATE_RESULT INVOICE_DATE DATE TO_DATE('2017-01-01','YYYY-MM-DD') 2 rows updated
View full article
Top Contributors