Background Exasol's UDF scripts make it possible to run machine-learning algorithms directly in the database. The training of a random-forest model and using this model to make predictions on a dataset in the Exasol database can be done in three steps: During a learning phase, the model is created based on a training dataset (e.g. from a table within the Exasol database) The model is put into BucketFS to make it accessible within UDF scripts A UDF script is used to make predictions. Prerequisites As an example, we create two tables RF.TRAIN and RF.TEST in the Exasol database and import the data from the files train.csv and test.csv. They both contain information about thousands of wines, e.g. the amount of sugar, the PH value, the acidity, and so on. The training dataset contains the taste of the wine (bad / normal / good). The goal is to train a model and classify the unknown wines in the test dataset. The SQL commands to create the table and the two CSV files are attached to this solution. We need an Exasol-Version 6.2 for the Python3 example. How to do Machine Learning in R and Python? Step 1: Learning Phase 1a. Learning Phase in R   In R, a random-forest model can be created by firstly reading the data from the Exasol database using the Exasol R SDK. It can be directly installed from the GitHub repository:   install.packages( 'randomForest' ) install.packages( 'devtools' ) devtools::install_github( "EXASOL/r-exasol" ) library(exasol) library(randomForest) con <- dbConnect( "exa" , exahost = "localhost:8563" , uid = "sys" , pwd = "exasol" ) train <- dbGetQuery(con, "SELECT * FROM RF.TRAIN" ) The dataframe  train }}contains the data from the Exasol table {{RF.TRAIN .   head(train)  shows the first rows in this table,  barplot(table(train$TASTE))  shows the taste distribution as a bar chart, e.g. in R Studio. Next, the random forest model can be created and stored into a file: rf <- randomForest(as.factor(TASTE) ~ ., data = train, ntree=1000) saveRDS(rf, "D:/rf.rds" , version=2) 1b. Learning Phase in Python 3 Since Exasol Version 6.2, the language Python 3 is included with many built-in packages like pandas, scikit-learn and many more. On a local Python 3 environment, the random-forest model can be built analogously to the example above in R. First, the training data is loaded from the Exasol database as a dataframe using the Python package pyexasol: import pyexasol import pandas as pd import numpy as np import pickle from sklearn.ensemble import RandomForestClassifier C = pyexasol.connect(dsn= 'localhost:8563' , user= 'sys' , password= 'exasol' ) train = C.export_to_pandas( "SELECT * FROM RF.TRAIN" ) The pandas dataframe train can be inspected with   train.head(5)   or   train.describe() . Next, the labels (bad/normal/good) from the taste column need to be converted into integers (-1/0/1): train.TASTE[train.TASTE == 'bad' ] = -1 train.TASTE[train.TASTE == 'normal' ] = 0 train.TASTE[train.TASTE == 'good' ] = 1 labels = np.array(train[ 'TASTE' ]) labels = labels.astype( ' int ' ) features = train.drop( 'TASTE' , axis=1) The array   labels   just contains the numeric labels for a all wines, e.g.   [1, 0, 0, -1, 1, 1, ...] .   features   looks like the original   train   dataframe but it does not contain the taste column. Next, the random-forest model can be trained and written into a file: clf = RandomForestClassifier(n_estimators=100, max_depth=2) clf.fit(features, labels) pickle.dump(clf, open( 'D:/clf.dat' , 'wb' )) Step 2: Putting the model into BucketFS The easiest way to work with Exasol's BucketFS is using the BucketFS explorer. After logging in with the Exaoperation credentials, a BucketFS service and a bucket can be created. In this new bucket, we move the model file (rf.rds for R, clf.dat for Python 3). Step 3: Creating a UDF script for prediction 3a. Creating a R UDF script for prediction When the bucket is not publicly accessible but read-protected, a connection object has to be created in the Exasol database to provide the read password to UDF scripts: CREATE CONNECTION my_bucket_access TO 'bucketfs:bucketfs1/udf' IDENTIFIED BY '12345' ; In an R UDF script, the random-forest model is read from BucketFS. After that, the input set of the UDF script is converted into a dataframe which is accepted by the predict function of the randomforest package. We load batches of 1000 rows each to make 1000 predictions in each iteration. --/ create or replace r set script rf.predict(...) emits (wine_id INT , taste VARCHAR (6)) as library(randomForest) run <- function (ctx) { rf <- readRDS( "/buckets/bucketfs1/udf/rf.rds" ) ## load the first batch of 1000 rows in the input set ctx$next_row(1000) repeat { wine_ids <- ctx[[1]]() ## create a dataframe from all input columns ( except the first one (wine_id)) numCols <- exa$meta$input_column_count df <- data .frame(ctx[[2]]()) for (i in 3:numCols) { df <- cbind(df,ctx[[i]]()) } colnames(df) <- c ( "FIXED_ACIDITY" , "VOLATILE_ACIDITY" , "CITRIC_ACID" , "RESIDUAL_SUGAR" , "CHLORIDES" , "FREE_SULFUR_DIOXIDE" , "TOTAL_SULFUR_DIOXIDE" , "DENSITY" , "PH" , "SULPHATES" , "ALCOHOL" ) prediction <- predict(rf, newdata=df) ctx$emit(wine_ids, as . character (prediction)) ## load the next batch of 1000 rows in the input set if (!(ctx$next_row(1000))){ break } } } / select rf.predict(wine_id, fixed_acidity, volatile_acidity, citric_acid, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, pH, sulphates, alcohol) from RF.test group by iproc(); Due to the   GROUP BY iproc() , the UDF script is called once per Exasol node with the data that is locally stored on that node to enable parallel and fast predictions. 3b. Creating a Python UDF script for prediction Using the scripting language PYTHON3 - which is built-in in Exasol since version 6.2 -, the input set can be accessed batch-wise as a pandas dataframe with   ctx.get_dataframe . --/ CREATE OR REPLACE PYTHON3 SET SCRIPT test.predict_wine_py(...) emits (wine_id INT , taste VARCHAR (6)) as import pickle import pandas as pdclf = pickle. load ( open ( '/buckets/bucketfs1/udf/clf.dat' , 'rb' )) clf.n_jobs = 1def run(ctx): BATCH_ROWS = 1000 while True : df = ctx.get_dataframe(num_rows=BATCH_ROWS) if df is None : break wine_ids = df[ '0' ] features = df. drop ( '0' , axis=1) res_df = pd.DataFrame( columns =[ 'WINE_ID' , 'TASTE' ]) res_df[ 'WINE_ID' ] = wine_ids res_df[ 'TASTE' ] = clf.predict(features) res_df.TASTE[res_df.TASTE == -1] = 'bad' res_df.TASTE[res_df.TASTE == 0] = 'normal' res_df.TASTE[res_df.TASTE == 1] = 'good' ctx.emit(res_df) / select rf.predict_wine_py(wine_id, fixed_acidity, volatile_acidity, citric_acid, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, pH, sulphates, alcohol) from RF.test group by iproc();
View full article
This article shows you how you can create functions to work with the Bing Maps Tile System
View full article
Background This solution is currently work in progress, but published anyway       Aggregation with KEEP FIRST/LAST workaround is given in this solution. We will try to add a workaround for analytical functions with KEEP FIRST/LAST All Joins in our examples are not taking care about potential NULL values. For NULL values a join condition using NVL or similar should be used No warranty for correctness How to transform the following select to something supported in EXASOL?       Note: The following transformation is an advanced topic. Understanding how to rewrite the query will need a considerable amount of time. SELECT <aggregate_function>(<aggregate_args>) KEEP ( DENSE_RANK FIRST / LAST ORDER BY <order_expr>) <other_select_list_exprs> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols> HAVING <having_clause> DENSE_RANK KEEP FIRST/LAST calculate the <aggregate_function> only on those values of <aggregate_args> that have the highest / lowest values in <order_expr>. In order to rewrite the statement we need to calculate: Step 1 <other_select_list_exprs> with the original GROUP BY Step 2 <aggregate_function>(<aggregate_args>) for a GROUP BY <group-by_cols>,<order_expr>. This computes the aggregates for each value of order_expr, so among others - which will be discarded - there is the computation for the FIRST and LAST value Step 3 The minimum/maximum value of <order_expr> for the original GROUP BY. This is needed to determine which subgroup to select from the finer aggregates. The values for the original select from (1) will then be enriched with values from (2) by a join on the GROUP BY-columns of the original select together with the attribute for selecting exactly the <aggregate_function> value for the subaggregation for the FIRST / LAST value. So basically, the rewriting looks like this: WITH original_aggregation AS ( SELECT <other_select_list_exprs>, min / max (<order_expr>) AS first_last_selector, <group_by_cols> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols> HAVING <having_clause> ), subaggregate_for_first_last AS ( SELECT <aggregate_function>(<aggregate_args>) AS afk, <group_by_cols>, <order_expr> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols>, <order_expr> HAVING <having_clause> -- not needed and not accurate, but filters anyway ) -- Now the join SELECT sub_agg.afk, orig_agg.<other_select_list_expr> FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.<group_by_cols> = sub_agg.<group_by_cols> AND orig_agg.first_last_selector = sub_agg.<order_expr> To get the computation faster, we don't use a separate select for 1) and 2), but combine the calculation in a single select with different GROUPING SETs and distinguish them by the different values of the GROUPING-function (see manual for more details): WITH all_aggregations AS ( SELECT <aggregate_function>(<aggregate_args>) AS afk, <other_select_list_exprs>, min / max (<order_expr>) AS first_last_selector, <group_by_cols>, <order_expr>, GROUPING (<group_by_cols>, <order_expr>) AS group_selector FROM <from_clause> WHERE <where_clause> GROUP BY GROUPING SETS ((<group_by_cols>), (<group_by_cols>, <order_expr>)) HAVING <having_clause> ), original_aggregation AS ( SELECT <other_select_list_exprs>, first_last_selector, <group_by_cols>, FROM all_aggregations WHERE group_selector=< GROUPING value for aggregate on <group_by_cols> only > ), subaggregate_for_first_last AS ( SELECT afk, <group_by_cols>, <order_expr> FROM all_aggregations WHERE group_selector=< GROUPING value for aggregate on <group_by_cols> and <order_expr>> ) -- Now the join SELECT sub_agg.afk, orig_agg.<other_select_list_expr> FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.<group_by_cols> = sub_agg.<group_by_cols> AND orig_agg.first_last_selector = sub_agg.<order_expr> Example of a simple transformation -- Sample data table CREATE TABLE T (i int , gc int , j int , k int ); INSERT INTO T VALUES (1, 1, 3, 1), (2, 1, 5, 1), (3, 1, 7, 2), (4, 1, 9, 3), (5, 1, 11, 3), (6, 2, 7, 1), (7, 2, 3, 2); -- This select SELECT max (i) AS col, sum (j) KEEP ( DENSE_RANK FIRST ORDER BY k ) kdrf FROM t GROUP BY gc; -- With result -- col kdrf -- 5 8 (=3+5, values 7, 9 and 11 are not aggregated) -- 7 7 (=7, value 3 is not aggregated) -- Is transformed to WITH all_aggregations AS ( SELECT sum (j) AS afk, max (i) as col, min ( k ) AS first_last_selector, gc, k , GROUPING (gc, k ) AS group_selector FROM t GROUP BY GROUPING SETS ((gc), (gc, k )) ), original_aggregation AS ( SELECT col, first_last_selector, gc FROM all_aggregations WHERE group_selector=1 ), subaggregate_for_first_last AS ( SELECT afk, gc, k FROM all_aggregations WHERE group_selector=0 ) -- Now the join SELECT orig_agg.col, sub_agg.afk as kdrf FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.gc = sub_agg.gc AND orig_agg.first_last_selector = sub_agg. k ; Additional Notes What to do with aggregated select (no GROUP BY) The case that KEEP DENSE RANK FIRST/LAST is used in an aggregated select without GROUP BY is a simpler special case of the case with GROUP BY. In that case we add a   GROUP BY GROUPING SETS (), (<order_expr>)   and join on only on   first_last_selector=order_expr . Essentially everything is the same in that case, only we can omit   <group_by_cols>   whereever they were used. Transformation for multiple FIRST/LAST functions Note that in the case of a single KEEP DENSE_RANK FIRST/LAST function the value of the group_selector in the WITH-clause subaggregate_for_first_last is 0. However, we can do the same transformation for multiple KEEP DENSE_RANK FIRST/LAST functions. In the general case we need a separate subaggregation for each KEEP DENSE_RANK FIRST/LAST function - which means another GROUPING SET (<group_by_cols>, <new_order_expr>), another WITH-clause with selection of the correct GROUPING and another LEFT JOIN on <group_by_cols> and <first_last_selector> for that <new_order_expr>. A few tweaks can be applied to the general case in some situations: if multiple KEEP DENSE_RANK FIRST/LAST functions share the same <order_expr> their <aggregate_function>-values can be computed in the same subaggregate_select (and same GROUPING SET), but might still need a sepate LEFT JOIN for each if they further have the same value of FIRST/LAST and have the same ORDER BY direction a single LEFT JOIN on this select is enough to get all the correct values. Example of a transformation with several FIRST/LAST Matrix for first_last_selector The following expressions need to be used as first_last_selector depending on FIRST/LAST and ORDER BY clause: ORDER BY FIRST LAST ASC   [NULLS LAST] min(<order_expr>) CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE max(<order_expr>) END DESC   [NULLS FIRST] CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE max(<order_expr>) END min(<order_expr>) ASC NULLS FIRST CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE min(<order_expr>) END max(<order_expr>) DESC NULLS LAST max(<order_expr>) CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE min(<order_expr>) END Multiple <order_by_expr>s If there are multiple <order_expr>s for a single KEEP DENSE_RANK FIRST/LAST function things get more complicated: In that case we can try to combine them all in a single value by concatenating them (and padding, such that order of the values is the same as for the concatenated values. This will work as long as all columns have the same ORDER BY direction Otherwise we might need to add a subselect around the <from_clause> containing <from>.* and RANK() OVER (ORDER BY <order_by_expr>s) as rnk. Then we can use MIN(CASE WHEN rnk=1 THEN <order_by_expr> ELSE NULL) for each <order_by_expr> to get multiple first_last_selectors for the LEFT JOIN. Of course, this would need to be done for each KEEP DENSE_RANK FIRST/LAST function with multiple <order_by_expr>s. Additional References https://www.exasol.com/support/browse/IDEA-65
View full article
Problem We want to replace all NULL values within a certain table with the default value specified for the according column. And we don't want to create the necessary update statements manually. Solution A small metadata-driven procedure script (Lua) that issues the required update statements. (See Attachment) Notes... ...on transactions The script performs a rollback after the metadata request, to avoid a read-write conflict scenario. performs all updates within a single transaction. will   not abort   when an update on one of the columns fails. performs a commit when all columns have been handled, regardless of any errors encountered. ...on column selection The script   includes   all columns that do have a DEFAULT value set. It   excludes   all columns with a NOT NULL constraint (ignoring the actual state of the constraint). Obviously, such a column can not contain any NULL values that need updating. ...on row selection Due to Exasol's memory management and data processing, the script handles each column separately. This minimizes both the amount of memory required for processing and the amount of data blocks being written. The script does   not   contain any delta functionality, it will process all rows of the table each time it is called. Installation Just create the script in any schema you like (CREATE SCRIPT permission required). It does not have any dependencies. create or replace /* procedure */ script REPLACE_NULL( schema_name , table_name ) ... Usage When calling the script, it expects two parameters: A schema name and a table name: execute script REPLACE_NULL( 'my schema ' , 'my table ' );   💡 Both schema and table name are expected as   string   and will be case-sensitive. Example open schema SR9000; -- Rows affected: 0 create table Invoice( invoice_id int , invoice_date date default date '2017-01-01' ); -- Rows affected: 0 insert into Invoice values (1, null ), (2, null ), (3, '2017-02-01' ); -- Rows affected: 3 execute script REPLACE_NULL( 'SR9000' , 'Invoice' ); -- [43000] " No columns found for " SR9000 "." Invoice "" caught in script " SR9000 "." REPLACE_NULL" at line 23 ( Session : 1585944483210400591) ... yes. We created the table using a regular identifier, so it ended up as uppercase... execute script REPLACE_NULL( 'SR9000' , 'INVOICE' ); This returns: COLUMN_NAME COLUMN_TYPE COLUMN_DEFAULT UPDATE_RESULT INVOICE_DATE DATE TO_DATE('2017-01-01','YYYY-MM-DD') 2 rows updated
View full article
This article describes how the schema EXA_STATISTICS is laid out, including the various classes of statistics available
View full article
Problem Sometimes databases growing too much on data. It is possible to set size limits for Exasol databases, so as to prevent databases from growing too much on data. This limit can be set for a database instance using database parameters. Solution Configuration setup You can configure local database limit setting the the following database parameters: Parameter Default Description rawSizeLimitInMiB 0 (no limit) Raw data size limit of the database in MiB. memSizeLimitInMiB 0 (no limit) Memory data size limit of the database in MiB. sizeLimitWarningPercent 85 Warning trigger margin. When reaching the stated % of the limit a warning message will be logged in EXAoperation. You can either choose a raw data or a memory data limit, but not both. These parameters are set in the Extra DB Parameters field of the database instance screen:   Limit checking When the Exasol database reaches 105% of the given size limit, no more data inserts are permitted. This impacts the following SQL actions: • IMPORT • INSERT • CREATE TABLE AS • MERGE • SELECT INTO All such statements are rejected with a SQL exception alike the following: [R0010] Statement currently not allowed because latest database raw size of 54.1 GiB exceeded local raw size limit of 50.0 GiB (108.2%). Please increase your limit or delete some data and FLUSH STATISTICS. To get the database back to normal operation, you have to reduce its data size below 100% of the limit. You can achieve this by either restarting the database with no or a larger limit, or by deleting or dropping table data. Whenever your data size passes critical limit margins at 85%, 105% and 100%, you get an EXAoperation warning, error or notice message respectively. Additional References For more information on managing data inserting restrictions and license limits, please consult   this article.  
View full article
We present a LUA script to display your current session information (and get you started using LUA as a tool).
View full article
This article shows you how to calculate the backup duration from the database
View full article
IMPORT of a timestamp column fails, because 0000-00-00 00:00:0000, which is a valid timestamp in MySQL, is invalid as a timestamp in Exasol.
View full article
Background Exasol database automatically computes and stores column statistics on demand, e.g. when joining tables for the first time. Those statistics consist of distinct estimates, minimum & maximum values, balancing, and other information. They are used by our query optimizer for estimation of join costs, filter sensitivities, etc. The computation of column statistics performs a complete column scan. Therefore it might be expensive, especially for large tables and/or tables not having all data in memory. This means that a query doing statistics computation may experience a significant slow down compared to the next execution. Column statistics are maintained as part of DML statements. They are recomputed if a significant amount of data has been changed since the last computation. Explanation Problem Description Due to format changes, all column statistics are invalidated during an update from EXASOL 5.0 to EXASOL 6.0. Required User Action   After the update to EXASOL 6.0, we recommend to recompute the statistics for the whole database to avoid any potential unexpected performance losses. Please note that the following command is introduced with version 6.0.4 (see EXASOL-2110 ). ANALYZE DATABASE REFRESH STATISTICS;   Alike other multi-table statements, ANALYZE DATABASE does an implicit COMMIT after each table minimizing transaction conflicts.   Time Estimation It is often useful to obtain an estimate on the duration of the ANALYZE DATABASE REFRESH STATISTICS statement. The query below delivers such an estimate (measured in seconds) when running before the update (that is, on EXASOL 5.0, while statistics are still valid): select cast ( zeroifnull( sum (raw_object_size) / 1024 / 1024 / 150 / nproc() ) as dec (18, 1) ) as COLUMN_STATISTICS_REFRESH_SECONDS from "$EXA_COLUMN_SIZES" where (column_schema, column_table, column_name ) in ( select column_schema, column_table, column_name from "$EXA_COLUMN_STATISTICS" where -- filter does not work on 6.0 before the REFRESH min_value_estimate is not null ); Additional References https://community.exasol.com/t5/database-features/exa-statistics/ta-p/1413
View full article
This article gives an example of how Exasol's Skyline Feature can be used
View full article
Question What is a smart table replication and how can the replication border be modified? Answer General information on smart table replication Replicates are used "on-the-fly" for local joins if a table is "small" regarding the threshold. A replicated table accesses data directly from other nodes' database memories and keeps a local copy in its own DBRAM. If a replicated table is modified, only changed data is reloaded into database memories of other nodes. Modified tables and subselects cannot be used with smart table replication. Table replication border does not apply to those.   💡 Using large replicated tables might cause decreased performance. Queries with expensive table scans (filter expressions) or between / cross joins may fall into this category. Soft replication border A table will be replicated if none of the thresholds below are reached. The table size threshold refers to the RAW_OBJECT_SIZE like in EXA_*_OBJECT_SIZES. The replication borders can be modified through   extra database parameters   in the web interface: -soft_replicationborder_in_numrows=<numrows> [ default is 100000 rows] -soft_replicationborder_in_kb=<kb> [ default is 1000000 -> 1GB]
View full article
Exasol cluster enlargement (adding nodes) and what you need to know if you are seeing impeded performance.
View full article
Question How is a ODBC data type mapped to EXASOL data type? How is a EXASOL data type mapped to ODBC data type? Answer ODBC to EXASOL: ODBC data type -> EXASOL data type       SQL_BIGINT   DECIMAL(36,0) SQL_BINARY     SQL_CHAR   CHAR SQL_DECIMAL(p,s)   DECIMAL(p,s) SQL_DOUBLE   DOUBLE PRECISION SQL_FLOAT   DOUBLE PRECISION SQL_INTEGER   DECIMAL(18,0) SQL_INTERVAL_YEAR     SQL_INTERVAL_MONTH     SQL_INTERVAL_YEAR_TO_MONTH   VARCHAR( n ) SQL_INTERVAL_DAY     SQL_INTERVAL_HOUR     SQL_INTERVAL_MINUTE     SQL_INTERVAL_SECOND     SQL_INTERVAL_DAY_TO_HOUR     SQL_INTERVAL_DAY_TO_MINUTE     SQL_INTERVAL_DAY_TO_SECOND   VARCHAR( n ) SQL_INTERVAL_HOUR_TO_MINUTE     SQL_INTERVAL_HOUR_TO_SECOND     SQL_INTERVAL_MINUTE_TO_SECOND     SQL_LONGVARBINARY     SQL_LONGVARCHAR   VARCHAR( n ) SQL_NUMERIC(p [,s] )   DECIMAL( p,s ) SQL_REAL   DOUBLE PRECISION SQL_SMALLINT   DECIMAL(9,0) SQL_TYPE_TIME     SQL_TINYINT   DECIMAL(3,0) SQL_TYPE_DATE   DATE SQL_TIMESTAMP   TIMESTAMP SQL_VARBINARY     SQL_VARCHAR   VARCHAR( n ) SQL_WCHAR   VARCHAR( n ) SQL_WVARCHAR   VARCHAR( n ) SQL_WLONGVARCHAR   VARCHAR( n ) SQL_BIT   BOOLEAN EXASOL to ODBC: EXASOL data type -> ODBC data type DECIMAL( p,s )   SQL_DECIMAL( p,s ) CHAR( n )   SQL_CHAR( n ) VARCHAR( n )   SQL_VARCHAR( n ) DOUBLE PRECISION   SQL_DOUBLE BOOLEAN   SQL_BIT DATE   SQL_TYPE_DATE TIMESTAMP   SQL_TYPE_TIMESTAMP TIMESTAMP WITH LOCAL TIME ZONE   SQL_TYPE_TIMESTAMP INVERVAL YEAR TO MONTH   SQL_CHAR( n ) INTERVAL DAY TO SECOND   SQL_CHAR( n ) GEOMETRY   SQL_VARCHAR(2000000) HASHTYPE ( * )   SQL_VARCHAR ( * ) Exasol Version 7.0.0
View full article
Background What does a preprocessor script do? Well, it preprocesses   To be more specific, an active preprocessor script is hooked before Exasol's SQL parser. This allows it to intercept and modify any SQL text sent to the database (or executed within a procedure script) before passing it on to the actual parser. What can I do with it? Those scripts are using the Lua language, so basically you can do anything with the preprocessor; here's a few facts and limitations, though: Exasol's Lua library is stripped and can not be extended with binary libraries for security reasons Preprocessor scripts do not take parameters; the "current" SQL text can be retrieved through a function call Preprocessor scripts   can   execute statements using (p)query Preprocessor scripts do not return any values; they "return" the modified SQL text through another function call While often preprocessor scripts are enabled on system level, any user can disable this in his or her session (see (2) below) Preprocessor scripts are executed in the   caller's context   and privileges. Also, if user can EXECUTE the script (which is a necessity), he/she can also READ it. Security by obscurity won't work. Typical Use Cases Compatibility layer for a frontend that produces SQL not suitable for Exasol Macro magic: Expanding predefined keywords server-side "human knows more" optimizations of queries and filters Row-Level Security (   💡   re-read the last two points above) Syntax and Semantics Please see the Exasol User Manual (Section 3.8) for details. Prerequisites As a preprocessor script is a   schema object, you will need to find or create a schema to create the script in: create schema if not exists PREPROCESSOR; Preconditions: CREATE SCHEMA   privilege   or   pre-existing schema How to work with Preprocessor Script? Step 1: Safety "CREATE SCRIPT" statements are also preprocessed. As the preprocessor script you are going to (re-)deploy is very likely to contain the keywords it should react on, it is advisable to disable the preprocessor before deployment: alter session set sql_preprocessor_script = null ; Step 2: Deploy Create the preprocessor script. Syntax "around" may depend on the SQL client you are using: --/ create or replace Lua script MY_PREPROCESSOR() as ... ...sqlparsing.getsqltext() ... ...sqlparsing.setsqltext(...) ... return / Preconditions: CREATE SCRIPT   privilege ownership of the schema   or   CREATE ANY SCRIPT   privilege Step 3: Activate locally Now activate the preprocessor for your local session: alter session set sql_preprocessor_script = PREPROCESSOR.MY_PREPROCESSOR; Step 4: TEST IT! Run a few statements to verify success. Best done with Auditing or Profiling enabled, so you can see the resulting SQL texts. When things go very wrong, go back to step (2) – This is the only SQL statement not passed through the preprocessor... Step 5: Activate globally Now that things went well, we can activate the script for other users (new sessions): alter system set sql_preprocessor_script = PREPROCESSOR.MY_PREPROCESSOR; Preconditions: ALTER SYSTEM   privilege Step 6: No wait, we forgot something important! We just locked out (more or less) everyone else from the database: They don't have   EXECUTE   permissions on the script! grant EXECUTE on PREPROCESSOR.MY_PREPROCESSOR to public ; Preconditions: ownership of the schema   or   GRANT ANY OBJECT PRIVILEGE   privilege Additional Notes Best Practice: As step (3) replaces the script, all privileges on it are lost in that step. To avoid this problem, the EXECUTE privilege should be put on schema level: grant EXECUTE on SCHEMA PREPROCESSOR to public ; Just make sure you don't put anything dangerous/secret into that schema Additional References Preprocessor Scripts Documentation List of functions in Exasol https://community.exasol.com/t5/database-features/using-the-sql-preprocessor-to-support-postgresql-mysql-functions/ta-p/1041
View full article
This solution describes how to determine all granted system and object privileges per user. A license for "Graph Search" is required to run the attached query.
View full article
Problem As documented (see   https://docs.exasol.com/sql/select.htm) the connect by cycle detection works different than Oracles implementation. We recommend to change your logic accordingly, however if you want to reduce Exasol results so that they match the Oracle results you can do this using this guide. Diagnosis Example for cycle detection Here is a small example to explain the difference between Exasol and Oracle. Given this table: insert into employees values ( 'B' , 1, 2); insert into employees values ( 'E' , 3, 4); insert into employees values ( ' C ' , 4, 2); insert into employees values ( ' C ' , 4, 5); insert into employees values ( ' A ' , 2, 99); insert into employees values ( ' A ' , 2, 3); insert into employees values ( 'D' , 100, 1); insert into employees values ( 'F' , 99, NULL ); You want to get the chain of command for Peters manager. Peter is directly managed by a 3rd line manager who is beneath the CEO. However in this company the 3rd line manager is also managed by the 1st line manager, which generates a cycle. When you run this query in Exasol you get the following result: SELECT sys_connect_by_path( name , '-' ) MANAGER_PATH, CONNECT_BY_ISCYCLE ISCYCLE, CONNECT_BY_ISLEAF ISLEAF, level l from employees start with manages=99 connect by nocycle prior id = manages ORDER BY level ;   MANAGER_PATH ISCYCLE ISLEAF L -A 0 0 1 -A-B 0 0 2 -A-C 0 0 2 -A-B-D 0 1 3 -A-C-E 0 0 3 -A-C-E-A 0 0 4 -A-C-E-A-B 0 0 5 -A-C-E-A-C 1 1 5 -A-C-E-A-B-D 0 1 6 The reason is that Exasol detects cycles in the current row. As the row with id 1 appears the second time in the path, the cycle is detected, see path .A-C-E-A-C Oracle does not detect rows in the current row, but checks if the child row has a cycle. For the same query the result is: MANAGER_PATH ISCYCLE ISLEAF L -A 0 0 1 -A-B 0 0 2 -A-C 0 0 2 -A-B-D 0 1 3 -A-C-E 0 0 3 The cycle is detected in row .5.1.3, because row 3 has a child (row 4) which has a child that is also an ancestor (row 1). This is kind of a double look ahead. Solution This Solution describes: how to emulate Oracles behaviour in Exasol If you want the same behaviour in Exasol you have to backtrack two levels from the cycle and remove everything that is a child of the row that oracle marks as cycle. For doing this you can use the following pattern. Your query has these parts: SELECT <original SELECT list> FROM <original TABLE OR VIEW/subselect> <original CONNECT BY clause> <original ORDER BY clause> Apply this pattern: WITH base_table AS ( SELECT ROW_NUMBER () OVER ( ORDER BY rowid ) AS row_num, <original SELECT list> FROM <original TABLE OR VIEW /subselect> ), add_cols_for_cycle_detection AS ( SELECT <original SELECT list> --extra columns oracle like cycle detection sys_connect_by_path(row_num, '.' ) AS cd_scbp, row_num, PRIOR row_num cd_prior_row_num, CONNECT_BY_ISCYCLE AS cd_cycle from base_table <original CONNECT BY clause> ), parent_of_cycle AS ( SELECT cd_prior_row_num FROM add_cols_for_cycle_detection WHERE cd_cycle=1 ), ora_cycle_start AS ( SELECT cd_scbp, cd_prior_row_num FROM add_cols_for_cycle_detection WHERE row_num IN ( SELECT cd_prior_row_num FROM parent_of_cycle) ), ora_like_cb AS ( SELECT * FROM add_cols_for_cycle_detection WHERE NOT EXISTS ( SELECT 1 FROM ora_cycle_start WHERE cd_scbp= SUBSTRING (add_cols_for_cycle_detection.cd_scbp,0,len(cd_scbp)) ) ) SELECT <original SELECT list> FROM ora_like_cb <original ORDER BY clause>; Applied to our example query the resulting query is: WITH base_table AS ( SELECT ROW_NUMBER () OVER ( ORDER BY rowid ) AS row_num, name , id, manages FROM employees ), add_cols_for_cycle_detection AS ( SELECT sys_connect_by_path( name , '-' ) MANAGER_PATH, CONNECT_BY_ISCYCLE ISCYCLE, CONNECT_BY_ISLEAF ISLEAF, LEVEL l, --extra columns oracle like cycle detection sys_connect_by_path(row_num, '.' ) AS cd_scbp, row_num, PRIOR row_num cd_prior_row_num, CONNECT_BY_ISCYCLE AS cd_cycle from base_table start with manages=99 connect by nocycle prior id = manages ), parent_of_cycle AS ( SELECT cd_prior_row_num FROM add_cols_for_cycle_detection WHERE cd_cycle=1 ), ora_cycle_start AS ( SELECT cd_scbp, cd_prior_row_num FROM add_cols_for_cycle_detection WHERE row_num IN ( SELECT cd_prior_row_num FROM parent_of_cycle) ), ora_like_cb AS ( SELECT * FROM add_cols_for_cycle_detection WHERE NOT EXISTS ( SELECT 1 FROM ora_cycle_start WHERE cd_scbp= SUBSTRING (add_cols_for_cycle_detection.cd_scbp,0,len(cd_scbp)) ) ) SELECT MANAGER_PATH, ISCYCLE, ISLEAF, L FROM ora_like_cb ORDER BY ISLEAF; The result is the same as in Oracle: MANAGER_PATH ISCYCLE ISLEAF L -A 0 0 1 -A-B 0 0 2 -A-C 0 0 2 -A-C-E 0 0 3 -A-B-D 0 1 3 If you also want CONNECT_BY_ISCYLE to work like in Oracle, you have to extend the pattern by another CTE ora_cycle: WITH base_table AS ( SELECT ROW_NUMBER () OVER ( ORDER BY rowid ) AS row_num, name , id, manages FROM employees ), add_cols_for_cycle_detection AS ( SELECT sys_connect_by_path( name , '-' ) MANAGER_PATH, CONNECT_BY_ISCYCLE ISCYCLE, CONNECT_BY_ISLEAF ISLEAF, LEVEL l, --extra columns oracle like cycle detection sys_connect_by_path(row_num, '.' ) AS cd_scbp, row_num, PRIOR row_num cd_prior_row_num, CONNECT_BY_ISCYCLE AS cd_cycle from base_table start with manages=99 connect by nocycle prior id = manages ), parent_of_cycle AS ( SELECT cd_prior_row_num FROM add_cols_for_cycle_detection WHERE cd_cycle=1 ), ora_cycle_start AS ( SELECT cd_scbp, cd_prior_row_num FROM add_cols_for_cycle_detection WHERE row_num IN ( SELECT cd_prior_row_num FROM parent_of_cycle) ), ora_like_cb AS ( SELECT * FROM add_cols_for_cycle_detection WHERE NOT EXISTS ( SELECT 1 FROM ora_cycle_start WHERE cd_scbp= SUBSTRING (add_cols_for_cycle_detection.cd_scbp,0,len(cd_scbp)) ) ), ora_cycle AS ( SELECT ora_like_cb.*, decode(ora_cycle_start.cd_prior_row_num, NULL , 0, 1) AS cyc FROM ora_like_cb LEFT JOIN ora_cycle_start ON ora_like_cb.row_num=ora_cycle_start.cd_prior_row_num ) SELECT MANAGER_PATH, ISCYCLE, ISLEAF, l, cyc FROM ora_cycle; Result: MANAGER_PATH ISCYCLE ISLEAF L CYC -A 0 0 1 0 -A-B 0 0 2 0 -A-C 0 0 2 0 -A-B-D 0 1 3 0 -A-C-E 0 0 3 1
View full article
This solution describes a method on how to retrieve type information without the need for a syntactically correct and complete CREATE statement.
View full article
This article describes the contents of the SYS schema
View full article
This article describes the purpose of the Exasol Logserver
View full article
Top Contributors