To reproduce certain problems, Exasol support may ask you to send DDL statements for required tables and views. At this point you have two choices:
Start drilling around to find each and every required view / table etc. This may be a lot of work and may end up with some communication round trips in case you overlooked a dependency or two.
Skip all that and just send us the full DDL for all schemas in your database instance. Just let us sort out what is needed and what is not. Both options are not optimal.
The attachment of this article contains a procedure script (Lua) that can create DDL statements for recursive dependencies of a view. The DDL are presented as a single-column result-set and are ready for copy/paste into a text editor (or EXAplus) for saving.
execute script meta.create_DDL( 'DUT' , 'TRUNK' );
--DDL created by user SYS at 2017-11-14 09:44:59.554000
-- table dependencies --
CREATE SCHEMA "DUT";
CREATE TABLE "DUT"."TAB1"(
"I" DECIMAL(18,0) IDENTITY NOT NULL,
-- SYSTEM TABLE: SYS.EXA_METADATA
-- function dependencies --
function func( param decimal(3) ) returns decimal(3)
return sqrt(param) * (select max(i) from dut.tab1);
-- script dependencies --
CREATE LUA SCALAR SCRIPT "LUA_SCALAR" () RETURNS DECIMAL(18,0) AS
-- view dependencies --
--> level 2
-- SYSTEM VIEW: SYS.CAT
--> level 1
CREATE VIEW "DUT"."BRANCH"
as ( select * from exa_metadata, cat );
-- final query/view:
CREATE VIEW "DUT"."TRUNK"
select * from dut.tab1, dut.branch
where func(j) > lua_scalar()
This script is work in progress and has only seen minimal testing so far.
Things known not to work:
Virtual Schemas – It is unlikely we would be able to clone your remote system anyway.
Connections and IMPORT inside views – Like virtual schemas, it probably does not make much sense.
Dependencies inside scripts – This is branded 'dynamic SQL' and our engine can not determine those dependencies without actually executing the script with specific input.
SELECT dependencies inside functions – Just don't do that. Like with scripts, these dependencies to not show up in the system.
If your model contains any of the above and it turns out to be relevant for reproduction of a problem, you might have to revert to "Skip all that" above. The "Copy Database" script in create-ddl-for-the-entire-database may be of use then.
In Exasol, the data is automatically evenly distributed among each node. This distribution is random, however. By specifying distribution keys, you can control how the data is distributed, which can lead to enormous performance improvements.
SOL: Metadata Backup
How it works
The file contains several scripts which will copy the DDL for all objects in the database. It also contains the necessary scripts to restore the metadata. If the config file is set up correctly, "_backup._sh" performs the following tasks:
Creates the specified directory. This is where all of the files will be stored
Connects to the database using an Exaplus profile (must be created beforehand)
Once connected to the database, the script creates a schema "BACKUP_SCRIPTS" and 2 scripts which are used in the backup process.
EXPORT statements are generated using the database script "BACKUP_SYS" for several system tables which can be referenced later on. The CSV files are saved in the './ddl/' path.
A database "restore_sys.sql" script is created and saved in the './ddl/' path that includes all the commands neccesary to restore the system tables on a new "SYS_OLD" schema.
The script executes the database script "METADATA_BACKUP" and creates DDL for all database objects, including schemas, tables, views, users, roles. Limitations in the script are listed at the end.
Specifically, this script will read data from system tables and create the necessary CREATE statements which can be executed at a later time.
The script creates a new schema, stores the data into a table in this new schema, and then exports the table contents into an SQL file to prevent formatting errors. After the export, the schema is dropped.
The DDL and CSV's of the old tables are compressed and saved as a .tar.gz file in the './backups' directory or on a different location if "EXTERNAL_DIR" is set on the "config" file.
Creating DDL based on system tables is not perfect and has some limitations and imperfections. To safeguard incorrect DDL creation, the script also saves the system tables which are used in the script. If a DDL is not created perfectly or if you discover imperfections/errors, you can query the system tables directly and create your own DDL.
invalid (outdated) views and their DCL will not be created
views and functions for which the schema or object itself was renamed still contain the original object names and may cause an error on DDL execution
comments at the end of view text may cause problems
GRANTOR of all privileges will be the user that runs the SQL code returned by the script
passwords of all users will be 'Start123', except for connections where passwords will be left empty
functions with dependencies will not be created in the appropriate order
UDF scripts with delimited identifiers (in- or output) will cause an error on DDL execution
If authenticating using Kerberos (< 6.0.8), please alter the script BI_METADATA_BACKUPV2 lines 155 and 156
Comments containing apostrophes (single-quote) will cause an error on DDL execution
Note: This solution has been updated to include new permissions/rights in version 6.1
Database is already created and is able to be connected from the system you are running the scripts
Database user, which you will connect to the database with, has the following system privileges:
SELECT ANY DICTIONARY
How to create a Metadata Backup Concept?
Save the attached tar.gz file to a directory of your choice. The only requirements are that the system is Linux-based and that Exaplus command line is installed.
Unzip the file:
tar xf metabackup_vYYYYMMDD.tar.gz
Create an Exaplus profile with all of the connection details, including database user, password, and connection string. Details on the parameters for Exaplus can be found in the user manual. Example:
/usr/opt/EXASuite-6/EXASolution-6.0.10/bin/Console/exaplus -u [YOU_USER] -p [YOUR_PASSWORD] -c [DB IP Address]:8563 -wp metadata_backup_profile
Change directory to the newly created folder:
Edit config file with the help of following information:
EXAPLUS = Path to EXAPLUS excluding the exaplus. For the example above, it would be ' /usr/opt/EXASuite-6/EXASolution-6.0.10/bin/Console'
PROFILENAME = The name of the profile created in the previous step as ( metadata_backup_profile )
EXTERNAL_DIR = The path where the metadata backup will be stored if specified. Can be an external filesystem. Should be mounted or available beforehand
SYSPATH = The path where metabackup_vYYYYMMDD.tar.gz was extracted to
DB_NAME = The Database Name
EXAPLUS_TIMEOUT = Timeout for Exaplus (default 300 seconds). If you want to prevent long-running queries, set the timeout accordingly. Please note, for very large databases, it might take over 5 minutes to run all of the scripts, so please set the timeout higher.
EXAPLUS_RECONNECT = Reconnect tries for Exaplus if the connection fails. Default value is set to '1'
Make .SH files executable
chmod 755 *.sh
./backup.sh or bash backup.sh
SOL: Metadata Restore
This script will import the CSV's created in the Backup and run all of the CREATE statements.
The script opens an Exaplus session, creates a schema called 'SYS_OLD' containing the same system tables that were created in the backup, and then imports the CSV's into these tables.
All of the CREATE statements are executed, which restores the 'structure' of the database, however all tables are empty.
Note: During execution, the owner of all objects is the user running the script. At the end of the script, the owner of all schema changes to match the correct owner.
To monitor errors, profiling is enabled by default. You can search through EXA_DBA_PROFILE_LAST_DAY to find commands which were rolled back
If the database is not empty, some objects may fail to be created if they already exist in the database. Objects will not be overwritten.
Limitations of the create DDL script may cause errors during the CREATE statements. Please check the restore log, profiling or auditing to identify statements which were not created successfully.
If restoring to a database running a different version than the database from the backup, the IMPORT of old sys tables may fail due to different columns.
Exaplus command line
Database is already created and is able to be connected from the system you are running the scripts on
It is recommended to start the database with auditing ENABLED
Database user will need extensive CREATE privileges. It is recommended that the user running the scripts has DBA privileges as the following commands will be carried out:
CREATE SCHEMA, TABLE, VIEW, SCRIPT, CONNECTION, ETC
How to apply a Metadata Restore?
The restore script can be run from the same system you ran the backup on or a different system. If running the restore on a new system, please follow steps 1-4 found in the Backup instructions to set up the files. NOTE: When setting up your exaplus profile, please enter the information for the database you are restoring to.
Unpack the backup tar into the directory of your choice
tar xf ddl-backup-DB_NAME-YYYY-MM-DD-HH-Mi-SS.tar.gz
Edit config file with the following information from:
SYSPATH = The path where metabackup.tar.gz was extracted to
DB_NAME = The Database Name
BACKUP_RESTORE_PATH = The path that you unpacked the backup file to (should end with '/ddls')
RESTORE_SYS_SQL_PATH = The path containing the restore script
./restore.sh or bash restore.sh
Exasol R SDK offers functionality to interact with the Exasol database out of R programs. It extends the ORDBC and offers fast data transfer between Exasol and R ( exa.readData() and exa.writeData() ) and it makes it convenient to run parts of your R code in parallel in Exasol using R UDF functions behind the scenes ( exa.createScript() )
Example (Digit recognition)
We will use a well-known machine learning algorithm for classifying handwritten digits. It is using a randomforest library, which is used to create decision trees for any kind of image. For a given training set and the actual result for each element, the library returns a decision tree that can be applied to other similar images, the test data. The data is stored pixel-wise (one value for each pixel). This How-To will focus on how to use the EXASolution R SDK and not focus on details of the algorithm or R. Further information on the algorithm and the data can be found here. The example was chosen to show how easy it is to read or write data from or to EXASolution and how easy it is to deploy a given algorithm to an Exasol UDF.
Exasol R SDK and randomForest needs to be installed on the used system. The attached files can be used to setup your Exasol database for the example:
(Creates a schema, a table for train data set and a table for test data set)
Place csv files ( train.csv
) at an appropriate place and change the path to those files in 02_IMPORT.sql
(Imports the train and test data set)
How to use ExaSolution R SDK
R implementation (reading the data sets from an EXASolution database)
The attached file 03_RandomForest.r shows how to use exa.readData() and exa.writeData() to interact with the database in R. As of Exasol 6.2.x, we needed to modify 1 line in 03_RandomForest.r in order to account for R version differences. Specifically, we are adding a parameter to the save command - version = 2. You are welcome to skip this modification and come back to revisit if you run into issues.
save(rf, file = "< path to rf.dat>")
save(rf, version = 2, file = "< path to rf.dat>")
Additional installation notes for 03_RandomForest.r - installing needed packages. Continuing on, the general procedure is:
Step 1. Connect to EXASolution and read the train and test data
This requires an ODBC DSN (named "solo" in this example)
Step 2. Read train and test data from the database and prepare them for further processing
Test data was enriched by an image id
Train data was enriched by a label showing the actual handwritten digit
Step 3. Use randomForest() to create a decision tree and print prediction
In this example, the decision forest is kept and tested on the test data
Extract the predictions for test data and print them
Step 4. Write decision tree to Exasol
We will use a bucket called binary to store the decision forest
Detailed information on EXABucket FS and buckets can be found in https://community.exasol.com/t5/environment-management/how-to-create-an-exabucketfs-service-and-bucket/ta-p/1515**
This item does not apply to Exasol 6+ and is only left in for archival purposes. Important Deprecation Notice: In Version 5 it is not possible to store binary data in EXASOL database. One possibility is to serialize the decision forest and store it in a table. SInce VARCHAR type in EXASOL is limited to 2 million characters, the resulting string should not exceed 2 million characters.
curl -v -X PUT -T rf.dat http: //w:<write_pw>@<ip of one cluster node>:<port>/binary/rf.dat
Step 5. Deploying a UDF
The other file ( 05_PREDICTINDB.sql ) deploys the functionality to a UDF in EXASOL. We create a variadic UDF receiving an arbitrary amount of input parameters (all pixels) The output is the predicted digit.
Load forest from the bucket
Create a dataframe for all input columns (pixels)
Use predict() and emit the predictions
Step 6. Using UDF within EXASolution
Now it is possible to use the deployed function within EXASolution like any other built-in function:
from rf.test cross join rfmodels.models;
What data can you read using subconnections?
What data can you insert using subconnections?
How to establish parallel connections?
What about transactions in parallel mode?
What about transactions in parallel mode?
You can use subconnections to read or insert data in parallel from and into the EXASOL server. This will increase performance considerably.
Using this interface you can read/write data in parallel from different physical hosts or from different processes or in one process from different threads.
What data can you read using subconnections? You need a statement that produces a large result set. The result set can be then retrieved in parts from the parallel connections. For small results this interface has no advantage.
What data can you insert using subconnections? Like when reading data, using this interfaces only makes sense if you want to insert large amounts of data. You can simply insert rows into a table in EXASOL using a prepared insert statement. The data coming through the parallel connections will be put together by the server into the right table.
How to establish parallel connections? First you need a normal JDBC connection. On this you can use the method EnterParallel() to start operating with subconnections. You will receive connection strings for all new parallel connections you can start now. Start the subconnections with auto commit off. After this you can start reading or sending data, nearly like in a normal connection.
Attention: You can specify the maximum number of subconnections in EnterParallel() . This number may be reduced by the server because only one subconnection is allowed per database node. You have to establish the subconnections by using all connection strings received from Get Worker Hosts() . Subconnections can only be used after all connections have been established.
What about transactions in parallel mode? Start the subconnections with auto commit off. Commits should be made only on the main connection after the subconections have inserted all data and they have closed the prepared statements.
An Java example is attached. In the example a main connection reads the connection strings for the subconnections from the server. For each subconnection a thread is started that inserts a few rows. Commit is executed on the main connection. Then other threads read the data from the table.