Background Below is an UDF script in R that executes a scoring model inside EXASOL. While example is using the flights dataset from R and applying a randomForest model, it may be used as a template for R models. It has already proven in a real case where a Neural Network-based scoring process on a multi-bilion dataset was transferred from an R Server into EXASOL, reducing the amount of time needed from approx. 1h to a few mins. How to deploy an R scoring model in EXASOL via UDF Typical points for adaptation are: Step 1 the library needed for scoring must be present (here: library(randomForest) Step 2 the column names in the data.frame used in the UDF must resemble those of the data frame used for building the model. Typically these can be extracted from the model, but for other kinds of models the location needs to be adapted: if (length(m) == 0) { ... # get the predictors names colnames <<- names(m$forest$xlevels) } Here the model is loaded from EXABucket (Redis in EXASOL 5) if not already present in the environment. Depending on the model size, the model loading is the main factor influencing the time needed for the scoring process. The character vector 'colnames' is later used to construct the data.frame; it must contain the column names. Step 3 Later when the InDB scoring is started by the SELECT, the columns need to be referenced in the same order as in the model building data frame (due to pt 2). The GROUP BY parameter ROWNUM / 100 000 may be a lever for performance fine tuning: the more rows per UDF, the faster the scoring, but if an instance gets too many rows, it runs out of memory. R (Client side) - download code for: EXASOL 6:   r_client_v6.R     (as shown here) EXASOL 5:   r_client_v5.R r_client_v6.R # build a model for testing # see www.github.com/exasol/r-exasol library(exasol) library(nycflights13) con <- dbConnect( "exa" , dsn= "exa" ) dbWriteTable(con, "R.FLIGHTS" , flights) df <- dbGetQuery(con, "select * from R.FLIGHTS limit 10000" ) dbDisconnect(con) library(randomForest) set.seed(852) df <- df[complete.cases(df),] model1 <- randomForest(as.factor(carrier) ~ dep_time + dep_delay, data=head(df, n=1000)) # like this , you could do prediction on the client-side: #predict(model1, df[,c( "dep_time" , "dep_delay" )]) # but instead, we want to predict in-database, thus we have to # upload the model to an EXABucket via httr library(httr) PUT( # EXABucket URL url = "http: //192.168.42.129:2101/bucket1/model1" , body = serialize(model1, ascii = FALSE, connection = NULL), # EXABucket: authenticate with write_user__name / write_user__password config = authenticate( "w" , "writepw" ) ) EXASOL (Server side) - download code for: EXASOL 6:   flights_v6.sql     (as shown here) EXASOL 5:   flights_v5.sql flights_v6.sql -- deploying the model open schema R; create or replace r set script pred_randomForest(...) emits (predict varchar (200000)) as library(randomForest) m <- list() colnames <- vector( mode = " character " ) run <- function (ctx) { if ( length ( m ) == 0) { # only load the model if not already present. Must be done in run() as ctx is needed. f <- file (ctx[[1]](), open = "rb" , raw = TRUE ) m <<- unserialize(f) # get the predictors names colnames <<- names ( m $forest$xlevels) } # load entire chunk ctx$next_row(NA) n <- exa$meta$input_column_count s <- paste0( "model_key" , "=ctx[[1]]()" ) for (i in 2:n) { # construct a data .frame from the chunk & colnames name <- colnames[i - 1] s <- paste0(s, ", " , name , "=ctx[[" , i, "]]()" ) } eval(parse( text = paste0( "inp_df <- data .frame(" , s, ")" ))) # predict & emit ctx$emit(predict( m , inp_df[, 2:n])) } / create connection sys_bucketfs to 'bucketfs:bfsdefault/bucket1' identified by 'readpw' ; select pred_randomForest( '/buckets/bfsdefault/bucket1/model1' , -- EXABucketFS path "dep_time" , "dep_delay" ) -- predictors from FLIGHTS group by iproc(), -- the node no for data locality cast ( rownum /100000 as int ); -- restrict the no of rows ( limit is R max . vector size ) Additional references: https://community.exasol.com/t5/database-features/how-to-use-exasolution-r-sdk/ta-p/1224 https://community.exasol.com/t5/environment-management/how-to-create-an-exabucketfs-service-and-bucket/ta-p/1515
View full article
Background This solution is currently work in progress, but published anyway       Aggregation with KEEP FIRST/LAST workaround is given in this solution. We will try to add a workaround for analytical functions with KEEP FIRST/LAST All Joins in our examples are not taking care about potential NULL values. For NULL values a join condition using NVL or similar should be used No warranty for correctness How to transform the following select to something supported in EXASOL?       Note: The following transformation is an advanced topic. Understanding how to rewrite the query will need a considerable amount of time. SELECT <aggregate_function>(<aggregate_args>) KEEP ( DENSE_RANK FIRST / LAST ORDER BY <order_expr>) <other_select_list_exprs> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols> HAVING <having_clause> DENSE_RANK KEEP FIRST/LAST calculate the <aggregate_function> only on those values of <aggregate_args> that have the highest / lowest values in <order_expr>. In order to rewrite the statement we need to calculate: Step 1 <other_select_list_exprs> with the original GROUP BY Step 2 <aggregate_function>(<aggregate_args>) for a GROUP BY <group-by_cols>,<order_expr>. This computes the aggregates for each value of order_expr, so among others - which will be discarded - there is the computation for the FIRST and LAST value Step 3 The minimum/maximum value of <order_expr> for the original GROUP BY. This is needed to determine which subgroup to select from the finer aggregates. The values for the original select from (1) will then be enriched with values from (2) by a join on the GROUP BY-columns of the original select together with the attribute for selecting exactly the <aggregate_function> value for the subaggregation for the FIRST / LAST value. So basically, the rewriting looks like this: WITH original_aggregation AS ( SELECT <other_select_list_exprs>, min / max (<order_expr>) AS first_last_selector, <group_by_cols> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols> HAVING <having_clause> ), subaggregate_for_first_last AS ( SELECT <aggregate_function>(<aggregate_args>) AS afk, <group_by_cols>, <order_expr> FROM <from_clause> WHERE <where_clause> GROUP BY <group_by_cols>, <order_expr> HAVING <having_clause> -- not needed and not accurate, but filters anyway ) -- Now the join SELECT sub_agg.afk, orig_agg.<other_select_list_expr> FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.<group_by_cols> = sub_agg.<group_by_cols> AND orig_agg.first_last_selector = sub_agg.<order_expr> To get the computation faster, we don't use a separate select for 1) and 2), but combine the calculation in a single select with different GROUPING SETs and distinguish them by the different values of the GROUPING-function (see manual for more details): WITH all_aggregations AS ( SELECT <aggregate_function>(<aggregate_args>) AS afk, <other_select_list_exprs>, min / max (<order_expr>) AS first_last_selector, <group_by_cols>, <order_expr>, GROUPING (<group_by_cols>, <order_expr>) AS group_selector FROM <from_clause> WHERE <where_clause> GROUP BY GROUPING SETS ((<group_by_cols>), (<group_by_cols>, <order_expr>)) HAVING <having_clause> ), original_aggregation AS ( SELECT <other_select_list_exprs>, first_last_selector, <group_by_cols>, FROM all_aggregations WHERE group_selector=< GROUPING value for aggregate on <group_by_cols> only > ), subaggregate_for_first_last AS ( SELECT afk, <group_by_cols>, <order_expr> FROM all_aggregations WHERE group_selector=< GROUPING value for aggregate on <group_by_cols> and <order_expr>> ) -- Now the join SELECT sub_agg.afk, orig_agg.<other_select_list_expr> FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.<group_by_cols> = sub_agg.<group_by_cols> AND orig_agg.first_last_selector = sub_agg.<order_expr> Example of a simple transformation -- Sample data table CREATE TABLE T (i int , gc int , j int , k int ); INSERT INTO T VALUES (1, 1, 3, 1), (2, 1, 5, 1), (3, 1, 7, 2), (4, 1, 9, 3), (5, 1, 11, 3), (6, 2, 7, 1), (7, 2, 3, 2); -- This select SELECT max (i) AS col, sum (j) KEEP ( DENSE_RANK FIRST ORDER BY k ) kdrf FROM t GROUP BY gc; -- With result -- col kdrf -- 5 8 (=3+5, values 7, 9 and 11 are not aggregated) -- 7 7 (=7, value 3 is not aggregated) -- Is transformed to WITH all_aggregations AS ( SELECT sum (j) AS afk, max (i) as col, min ( k ) AS first_last_selector, gc, k , GROUPING (gc, k ) AS group_selector FROM t GROUP BY GROUPING SETS ((gc), (gc, k )) ), original_aggregation AS ( SELECT col, first_last_selector, gc FROM all_aggregations WHERE group_selector=1 ), subaggregate_for_first_last AS ( SELECT afk, gc, k FROM all_aggregations WHERE group_selector=0 ) -- Now the join SELECT orig_agg.col, sub_agg.afk as kdrf FROM original_aggregation AS orig_agg LEFT JOIN subaggregate_for_first_last AS sub_agg ON orig_agg.gc = sub_agg.gc AND orig_agg.first_last_selector = sub_agg. k ; Additional Notes What to do with aggregated select (no GROUP BY) The case that KEEP DENSE RANK FIRST/LAST is used in an aggregated select without GROUP BY is a simpler special case of the case with GROUP BY. In that case we add a   GROUP BY GROUPING SETS (), (<order_expr>)   and join on only on   first_last_selector=order_expr . Essentially everything is the same in that case, only we can omit   <group_by_cols>   whereever they were used. Transformation for multiple FIRST/LAST functions Note that in the case of a single KEEP DENSE_RANK FIRST/LAST function the value of the group_selector in the WITH-clause subaggregate_for_first_last is 0. However, we can do the same transformation for multiple KEEP DENSE_RANK FIRST/LAST functions. In the general case we need a separate subaggregation for each KEEP DENSE_RANK FIRST/LAST function - which means another GROUPING SET (<group_by_cols>, <new_order_expr>), another WITH-clause with selection of the correct GROUPING and another LEFT JOIN on <group_by_cols> and <first_last_selector> for that <new_order_expr>. A few tweaks can be applied to the general case in some situations: if multiple KEEP DENSE_RANK FIRST/LAST functions share the same <order_expr> their <aggregate_function>-values can be computed in the same subaggregate_select (and same GROUPING SET), but might still need a sepate LEFT JOIN for each if they further have the same value of FIRST/LAST and have the same ORDER BY direction a single LEFT JOIN on this select is enough to get all the correct values. Example of a transformation with several FIRST/LAST Matrix for first_last_selector The following expressions need to be used as first_last_selector depending on FIRST/LAST and ORDER BY clause: ORDER BY FIRST LAST ASC   [NULLS LAST] min(<order_expr>) CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE max(<order_expr>) END DESC   [NULLS FIRST] CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE max(<order_expr>) END min(<order_expr>) ASC NULLS FIRST CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE min(<order_expr>) END max(<order_expr>) DESC NULLS LAST max(<order_expr>) CASE WHEN COUNT(<order_expr>)!=COUNT       THEN NULL ELSE min(<order_expr>) END Multiple <order_by_expr>s If there are multiple <order_expr>s for a single KEEP DENSE_RANK FIRST/LAST function things get more complicated: In that case we can try to combine them all in a single value by concatenating them (and padding, such that order of the values is the same as for the concatenated values. This will work as long as all columns have the same ORDER BY direction Otherwise we might need to add a subselect around the <from_clause> containing <from>.* and RANK() OVER (ORDER BY <order_by_expr>s) as rnk. Then we can use MIN(CASE WHEN rnk=1 THEN <order_by_expr> ELSE NULL) for each <order_by_expr> to get multiple first_last_selectors for the LEFT JOIN. Of course, this would need to be done for each KEEP DENSE_RANK FIRST/LAST function with multiple <order_by_expr>s. Additional References https://www.exasol.com/support/browse/IDEA-65
View full article
Problem Sometimes databases growing too much on data. It is possible to set size limits for Exasol databases, so as to prevent databases from growing too much on data. This limit can be set for a database instance using database parameters. Solution Configuration setup You can configure local database limit setting the the following database parameters: Parameter Default Description rawSizeLimitInMiB 0 (no limit) Raw data size limit of the database in MiB. memSizeLimitInMiB 0 (no limit) Memory data size limit of the database in MiB. sizeLimitWarningPercent 85 Warning trigger margin. When reaching the stated % of the limit a warning message will be logged in EXAoperation. You can either choose a raw data or a memory data limit, but not both. These parameters are set in the Extra DB Parameters field of the database instance screen:   Limit checking When the Exasol database reaches 105% of the given size limit, no more data inserts are permitted. This impacts the following SQL actions: • IMPORT • INSERT • CREATE TABLE AS • MERGE • SELECT INTO All such statements are rejected with a SQL exception alike the following: [R0010] Statement currently not allowed because latest database raw size of 54.1 GiB exceeded local raw size limit of 50.0 GiB (108.2%). Please increase your limit or delete some data and FLUSH STATISTICS. To get the database back to normal operation, you have to reduce its data size below 100% of the limit. You can achieve this by either restarting the database with no or a larger limit, or by deleting or dropping table data. Whenever your data size passes critical limit margins at 85%, 105% and 100%, you get an EXAoperation warning, error or notice message respectively. Additional References For more information on managing data inserting restrictions and license limits, please consult   this article.  
View full article
We present a LUA script to display your current session information (and get you started using LUA as a tool).
View full article
This article shows you how to calculate the backup duration from the database
View full article
Background Exasol database automatically computes and stores column statistics on demand, e.g. when joining tables for the first time. Those statistics consist of distinct estimates, minimum & maximum values, balancing, and other information. They are used by our query optimizer for estimation of join costs, filter sensitivities, etc. The computation of column statistics performs a complete column scan. Therefore it might be expensive, especially for large tables and/or tables not having all data in memory. This means that a query doing statistics computation may experience a significant slow down compared to the next execution. Column statistics are maintained as part of DML statements. They are recomputed if a significant amount of data has been changed since the last computation. Explanation Problem Description Due to format changes, all column statistics are invalidated during an update from EXASOL 5.0 to EXASOL 6.0. Required User Action   After the update to EXASOL 6.0, we recommend to recompute the statistics for the whole database to avoid any potential unexpected performance losses. Please note that the following command is introduced with version 6.0.4 (see EXASOL-2110 ). ANALYZE DATABASE REFRESH STATISTICS;   Alike other multi-table statements, ANALYZE DATABASE does an implicit COMMIT after each table minimizing transaction conflicts.   Time Estimation It is often useful to obtain an estimate on the duration of the ANALYZE DATABASE REFRESH STATISTICS statement. The query below delivers such an estimate (measured in seconds) when running before the update (that is, on EXASOL 5.0, while statistics are still valid): select cast ( zeroifnull( sum (raw_object_size) / 1024 / 1024 / 150 / nproc() ) as dec (18, 1) ) as COLUMN_STATISTICS_REFRESH_SECONDS from "$EXA_COLUMN_SIZES" where (column_schema, column_table, column_name ) in ( select column_schema, column_table, column_name from "$EXA_COLUMN_STATISTICS" where -- filter does not work on 6.0 before the REFRESH min_value_estimate is not null ); Additional References https://community.exasol.com/t5/database-features/exa-statistics/ta-p/1413
View full article
This article gives an example of how Exasol's Skyline Feature can be used
View full article
Background This article describes the automatic process to move data after node failures Symptoms A typical data volume of a database has a redundancy of 2. With this configuration, data on each volume node have one redundancy copy on another volume node. Thus, the failure of one volume node can be handled without data loss. A typical database configuration consists of some active database nodes and one or more reserve nodes. Thus, in combination with a data volume having the redundancy of 2 or higher, the database can be automatically restarted after a node failure. A typical database is configured with the same active nodes as its data volume to ensure data locality. After a volume node failure (and despite any succeeding automatic database restart), the data volume loses the redundancy of one node. Explanation To automatically recover a missing data redundancy in the above-mentioned scenario, a process exists to handle this situation. For this process to work, the user is required to define a Restore Delay in the database configuration, whose default value is 10 minutes, which means Look at all database and volume nodes 10 minutes after an automatic database restart and try to move data from the offline volume node to the newly utilized database reserve node. Due to the nature of a data move operation (it is very expensive), this operation should be prevented, if possible. Thus, this process only moves data under the following circumstances: the timeout of the restore delay after a database startup has been reached: In the case of multiple database restarts, e.g. due to hardware failures, do not start data move operations too early. the database has been restarted automatically: If a database has been started manually, the current configuration is accepted as is. exactly one volume node is offline: If all volume nodes are online, the database nodes should be moved instead to match the volume nodes (this requires a user-triggered database restart). If more than one volume node is offline, the operation will not be started, either (and a monitoring error message will be logged). data can be accessed locally by database after moving data: If data from the offline volume node can only be moved to a database node, which is not responsible for the management of the appropriate volume node data, the operation will not be started, either (and a monitoring error message will be logged).
View full article
IMPORT of a timestamp column fails, because 0000-00-00 00:00:0000, which is a valid timestamp in MySQL, is invalid as a timestamp in Exasol.
View full article
Background This article describes a way how to connect Spotfire to Exasol. Prerequisites Download the EXASOL JDBC driver and extract the exajdbc.jar How to connect Spotfire to Exasol Step 1 Copy the exajdbc.jar to to the Spotfire library directory, see the following link where this directory is located: https://community.tibco.com/wiki/tibco-spotfirer-jdbc-data-access-connectivity-details#toc-2 Step 2 Restart the Spotfire Server so the EXASOL jdbc driver is loaded. Step 3 Start the Spotfire Server configuration tool Step 4 Go to the Configuration tab and create a new Data Source Template Step 5 In the Add data source template enter EXASOL as name and copy paste the following template: <jdbc-type-settings> <type-name>EXASOL</type-name> <driver>com.exasol.jdbc.EXADriver</driver> <connection-url-pattern>jdbc:exa:&lt;host&gt;:&lt;port&gt;;clientname=Spotfire;</connection-url-pattern> <ping-command>SELECT 1</ping-command> <supports-catalogs> false </supports-catalogs> <supports-schemas> true </supports-schemas> <java-to-sql-type-conversions> <type-mapping> <from max-length= "2000000" > String </from> <to>VARCHAR($$value$$)</to> </type-mapping> <type-mapping> <from> Integer </from> <to>DECIMAL(18,0)</to> </type-mapping> <type-mapping> <from> Long </from> <to>DECIMAL(36,0)</to> </type-mapping> <type-mapping> <from> Float </from> <to>REAL</to> </type-mapping> <type-mapping> <from> Double </from> <to>DOUBLE PRECISION</to> </type-mapping> <type-mapping> <from>Date</from> <to>DATE</to> </type-mapping> <type-mapping> <from>DateTime</from> <to>TIMESTAMP</to> </type-mapping> </java-to-sql-type-conversions> </jdbc-type-settings> Step 6 Save the config, close the configuration tool and restart the server Step 7 Check the logs if the XML config for EXASOL was validated correctly Step 8 Use the Information Designer tool in the Spotfire Analyst to create a new Data Source
View full article
Problem It's currently not possible to include only specified schemas or to exclude some schemas from the backup process. The backup "knows" nothing about schemas or other database objects. Solution It's however possible to restore only some data by starting a backup in a virtual access mode and transfere the requested data manually by using IMPORT/EXPORT commands. Additional References Please consider Virtual Access on Database Backup https://docs.exasol.com/administration/on-premise/backup_restore/virtual_access_on_backup.htm https://community.exasol.com/t5/database-features/virtual-access-on-database-backups/ta-p/813
View full article
Question Scenario: Merge data from a (small) source table into a (big) target table. The source contains only a few rows (below 10.000, mostly about 1000). The target contains some million records. Solution1: delerte from target where id in (select id from source); insert into target select * from source; Solution 2: merge into target t using source s on (t.id = s.id) when matched then update ... when not matched then insert; Questions: Is the merge command more effectiv than a delete plus insert? Is there a need to delete duplicate rows from the source in advance? Answer Testing with a variety of source rowsets against a target with about 6 mio. rows showed a slighty time advance using the merge command. Overall less internal steps are performed in the merge compared to delete/insert. Furthermore the subselect in the delete command will be materialized in a temporäry table and then replicated over all knodes, because the number or rows is below the replication border. This causes additional load in the network. The suggestion is therefore: use the merge command Cleaning the source from duplicates is mandatory since the on-condition within the merge-statement needs a unique source rowset. Otherwise the exception "Unable to get a stable set of rows in the source tables" will be thrown.
View full article
Question What is a smart table replication and how can the replication border be modified? Answer General information on smart table replication Replicates are used "on-the-fly" for local joins if a table is "small" regarding the threshold. A replicated table accesses data directly from other nodes' database memories and keeps a local copy in its own DBRAM. If a replicated table is modified, only changed data is reloaded into database memories of other nodes. Modified tables and subselects cannot be used with smart table replication. Table replication border does not apply to those.   💡 Using large replicated tables might cause decreased performance. Queries with expensive table scans (filter expressions) or between / cross joins may fall into this category. Soft replication border A table will be replicated if none of the thresholds below are reached. The table size threshold refers to the RAW_OBJECT_SIZE like in EXA_*_OBJECT_SIZES. The replication borders can be modified through   extra database parameters   in the web interface: -soft_replicationborder_in_numrows=<numrows> [ default is 100000 rows] -soft_replicationborder_in_kb=<kb> [ default is 1000000 -> 1GB]
View full article
Sometimes it is useful to enforce a materialization of a subselect (or view or CTE) by adding an ORDER BY FALSE to it. Those cases include: Late applied filter (see attached, pdf) Replace global by local join (by enforcing a materialization of a subselect that is smaller than replication border   [see replication-border-in-exasol-6-1]   with local filters) Manual precalculation of multiple usages of subselects, views, CTEs   Please be aware that materializations can cause a lot of temporary data if they are big which might result in block swapping and decrease thoughput The attached documents show when a subselect, view or CTE needs to be materialized     01_Materializations.pdf  and a use case to improve performance   02_OptimizationExampleMaterialization.pdf  Those documents require some knowledge about profiling .
View full article
Exasol cluster enlargement (adding nodes) and what you need to know if you are seeing impeded performance.
View full article
Problem As documented (see   https://docs.exasol.com/sql/select.htm) the connect by cycle detection works different than Oracles implementation. We recommend to change your logic accordingly, however if you want to reduce Exasol results so that they match the Oracle results you can do this using this guide. Diagnosis Example for cycle detection Here is a small example to explain the difference between Exasol and Oracle. Given this table: insert into employees values ( 'B' , 1, 2); insert into employees values ( 'E' , 3, 4); insert into employees values ( ' C ' , 4, 2); insert into employees values ( ' C ' , 4, 5); insert into employees values ( ' A ' , 2, 99); insert into employees values ( ' A ' , 2, 3); insert into employees values ( 'D' , 100, 1); insert into employees values ( 'F' , 99, NULL ); You want to get the chain of command for Peters manager. Peter is directly managed by a 3rd line manager who is beneath the CEO. However in this company the 3rd line manager is also managed by the 1st line manager, which generates a cycle. When you run this query in Exasol you get the following result: SELECT sys_connect_by_path( name , '-' ) MANAGER_PATH, CONNECT_BY_ISCYCLE ISCYCLE, CONNECT_BY_ISLEAF ISLEAF, level l from employees start with manages=99 connect by nocycle prior id = manages ORDER BY level ;   MANAGER_PATH ISCYCLE ISLEAF L -A 0 0 1 -A-B 0 0 2 -A-C 0 0 2 -A-B-D 0 1 3 -A-C-E 0 0 3 -A-C-E-A 0 0 4 -A-C-E-A-B 0 0 5 -A-C-E-A-C 1 1 5 -A-C-E-A-B-D 0 1 6 The reason is that Exasol detects cycles in the current row. As the row with id 1 appears the second time in the path, the cycle is detected, see path .A-C-E-A-C Oracle does not detect rows in the current row, but checks if the child row has a cycle. For the same query the result is: MANAGER_PATH ISCYCLE ISLEAF L -A 0 0 1 -A-B 0 0 2 -A-C 0 0 2 -A-B-D 0 1 3 -A-C-E 0 0 3 The cycle is detected in row .5.1.3, because row 3 has a child (row 4) which has a child that is also an ancestor (row 1). This is kind of a double look ahead. Solution This Solution describes: how to emulate Oracles behaviour in Exasol If you want the same behaviour in Exasol you have to backtrack two levels from the cycle and remove everything that is a child of the row that oracle marks as cycle. For doing this you can use the following pattern. Your query has these parts: SELECT <original SELECT list> FROM <original TABLE OR VIEW/subselect> <original CONNECT BY clause> <original ORDER BY clause> Apply this pattern: WITH base_table AS ( SELECT ROW_NUMBER () OVER ( ORDER BY rowid ) AS row_num, <original SELECT list> FROM <original TABLE OR VIEW /subselect> ), add_cols_for_cycle_detection AS ( SELECT <original SELECT list> --extra columns oracle like cycle detection sys_connect_by_path(row_num, '.' ) AS cd_scbp, row_num, PRIOR row_num cd_prior_row_num, CONNECT_BY_ISCYCLE AS cd_cycle from base_table <original CONNECT BY clause> ), parent_of_cycle AS ( SELECT cd_prior_row_num FROM add_cols_for_cycle_detection WHERE cd_cycle=1 ), ora_cycle_start AS ( SELECT cd_scbp, cd_prior_row_num FROM add_cols_for_cycle_detection WHERE row_num IN ( SELECT cd_prior_row_num FROM parent_of_cycle) ), ora_like_cb AS ( SELECT * FROM add_cols_for_cycle_detection WHERE NOT EXISTS ( SELECT 1 FROM ora_cycle_start WHERE cd_scbp= SUBSTRING (add_cols_for_cycle_detection.cd_scbp,0,len(cd_scbp)) ) ) SELECT <original SELECT list> FROM ora_like_cb <original ORDER BY clause>; Applied to our example query the resulting query is: WITH base_table AS ( SELECT ROW_NUMBER () OVER ( ORDER BY rowid ) AS row_num, name , id, manages FROM employees ), add_cols_for_cycle_detection AS ( SELECT sys_connect_by_path( name , '-' ) MANAGER_PATH, CONNECT_BY_ISCYCLE ISCYCLE, CONNECT_BY_ISLEAF ISLEAF, LEVEL l, --extra columns oracle like cycle detection sys_connect_by_path(row_num, '.' ) AS cd_scbp, row_num, PRIOR row_num cd_prior_row_num, CONNECT_BY_ISCYCLE AS cd_cycle from base_table start with manages=99 connect by nocycle prior id = manages ), parent_of_cycle AS ( SELECT cd_prior_row_num FROM add_cols_for_cycle_detection WHERE cd_cycle=1 ), ora_cycle_start AS ( SELECT cd_scbp, cd_prior_row_num FROM add_cols_for_cycle_detection WHERE row_num IN ( SELECT cd_prior_row_num FROM parent_of_cycle) ), ora_like_cb AS ( SELECT * FROM add_cols_for_cycle_detection WHERE NOT EXISTS ( SELECT 1 FROM ora_cycle_start WHERE cd_scbp= SUBSTRING (add_cols_for_cycle_detection.cd_scbp,0,len(cd_scbp)) ) ) SELECT MANAGER_PATH, ISCYCLE, ISLEAF, L FROM ora_like_cb ORDER BY ISLEAF; The result is the same as in Oracle: MANAGER_PATH ISCYCLE ISLEAF L -A 0 0 1 -A-B 0 0 2 -A-C 0 0 2 -A-C-E 0 0 3 -A-B-D 0 1 3 If you also want CONNECT_BY_ISCYLE to work like in Oracle, you have to extend the pattern by another CTE ora_cycle: WITH base_table AS ( SELECT ROW_NUMBER () OVER ( ORDER BY rowid ) AS row_num, name , id, manages FROM employees ), add_cols_for_cycle_detection AS ( SELECT sys_connect_by_path( name , '-' ) MANAGER_PATH, CONNECT_BY_ISCYCLE ISCYCLE, CONNECT_BY_ISLEAF ISLEAF, LEVEL l, --extra columns oracle like cycle detection sys_connect_by_path(row_num, '.' ) AS cd_scbp, row_num, PRIOR row_num cd_prior_row_num, CONNECT_BY_ISCYCLE AS cd_cycle from base_table start with manages=99 connect by nocycle prior id = manages ), parent_of_cycle AS ( SELECT cd_prior_row_num FROM add_cols_for_cycle_detection WHERE cd_cycle=1 ), ora_cycle_start AS ( SELECT cd_scbp, cd_prior_row_num FROM add_cols_for_cycle_detection WHERE row_num IN ( SELECT cd_prior_row_num FROM parent_of_cycle) ), ora_like_cb AS ( SELECT * FROM add_cols_for_cycle_detection WHERE NOT EXISTS ( SELECT 1 FROM ora_cycle_start WHERE cd_scbp= SUBSTRING (add_cols_for_cycle_detection.cd_scbp,0,len(cd_scbp)) ) ), ora_cycle AS ( SELECT ora_like_cb.*, decode(ora_cycle_start.cd_prior_row_num, NULL , 0, 1) AS cyc FROM ora_like_cb LEFT JOIN ora_cycle_start ON ora_like_cb.row_num=ora_cycle_start.cd_prior_row_num ) SELECT MANAGER_PATH, ISCYCLE, ISLEAF, l, cyc FROM ora_cycle; Result: MANAGER_PATH ISCYCLE ISLEAF L CYC -A 0 0 1 0 -A-B 0 0 2 0 -A-C 0 0 2 0 -A-B-D 0 1 3 0 -A-C-E 0 0 3 1
View full article
This solution describes how to determine all granted system and object privileges per user. A license for "Graph Search" is required to run the attached query.
View full article
Background What does a preprocessor script do? Well, it preprocesses   To be more specific, an active preprocessor script is hooked before Exasol's SQL parser. This allows it to intercept and modify any SQL text sent to the database (or executed within a procedure script) before passing it on to the actual parser. What can I do with it? Those scripts are using the Lua language, so basically you can do anything with the preprocessor; here's a few facts and limitations, though: Exasol's Lua library is stripped and can not be extended with binary libraries for security reasons Preprocessor scripts do not take parameters; the "current" SQL text can be retrieved through a function call Preprocessor scripts   can   execute statements using (p)query Preprocessor scripts do not return any values; they "return" the modified SQL text through another function call While often preprocessor scripts are enabled on system level, any user can disable this in his or her session (see (2) below) Preprocessor scripts are executed in the   caller's context   and privileges. Also, if user can EXECUTE the script (which is a necessity), he/she can also READ it. Security by obscurity won't work. Typical Use Cases Compatibility layer for a frontend that produces SQL not suitable for Exasol Macro magic: Expanding predefined keywords server-side "human knows more" optimizations of queries and filters Row-Level Security (   💡   re-read the last two points above) Syntax and Semantics Please see the Exasol User Manual (Section 3.8) for details. Prerequisites As a preprocessor script is a   schema object, you will need to find or create a schema to create the script in: create schema if not exists PREPROCESSOR; Preconditions: CREATE SCHEMA   privilege   or   pre-existing schema How to work with Preprocessor Script? Step 1: Safety "CREATE SCRIPT" statements are also preprocessed. As the preprocessor script you are going to (re-)deploy is very likely to contain the keywords it should react on, it is advisable to disable the preprocessor before deployment: alter session set sql_preprocessor_script = null ; Step 2: Deploy Create the preprocessor script. Syntax "around" may depend on the SQL client you are using: --/ create or replace Lua script MY_PREPROCESSOR() as ... ...sqlparsing.getsqltext() ... ...sqlparsing.setsqltext(...) ... return / Preconditions: CREATE SCRIPT   privilege ownership of the schema   or   CREATE ANY SCRIPT   privilege Step 3: Activate locally Now activate the preprocessor for your local session: alter session set sql_preprocessor_script = PREPROCESSOR.MY_PREPROCESSOR; Step 4: TEST IT! Run a few statements to verify success. Best done with Auditing or Profiling enabled, so you can see the resulting SQL texts. When things go very wrong, go back to step (2) – This is the only SQL statement not passed through the preprocessor... Step 5: Activate globally Now that things went well, we can activate the script for other users (new sessions): alter system set sql_preprocessor_script = PREPROCESSOR.MY_PREPROCESSOR; Preconditions: ALTER SYSTEM   privilege Step 6: No wait, we forgot something important! We just locked out (more or less) everyone else from the database: They don't have   EXECUTE   permissions on the script! grant EXECUTE on PREPROCESSOR.MY_PREPROCESSOR to public ; Preconditions: ownership of the schema   or   GRANT ANY OBJECT PRIVILEGE   privilege Additional Notes Best Practice: As step (3) replaces the script, all privileges on it are lost in that step. To avoid this problem, the EXECUTE privilege should be put on schema level: grant EXECUTE on SCHEMA PREPROCESSOR to public ; Just make sure you don't put anything dangerous/secret into that schema Additional References Preprocessor Scripts Documentation List of functions in Exasol https://community.exasol.com/t5/database-features/using-the-sql-preprocessor-to-support-postgresql-mysql-functions/ta-p/1041
View full article
This solution describes a method on how to retrieve type information without the need for a syntactically correct and complete CREATE statement.
View full article
This article describes the contents of the SYS schema
View full article
Top Contributors