This article describes how you convert Exasol's GEOMETRY types to GeoJSON
View full article
This article give suggestions on how to minimize transaction conflicts.
View full article
This article gives general information about Exasol's indices.
View full article
Estimating work duration when doing reorganizations within Exasol.
View full article
SQL statement to add locking information to your session system tables using the EXA_SQL_LAST_DAY statistics.
View full article
This article talks about the use of UNION ALL at its benefits.
View full article
This article gives an example of using the Preprocessor function to automatically handle unsupported functions in Exasol
View full article
Question With Exasol's automated disk and memory management, it its not always easy to answer questions like How much data is stored in the database? How much disk space do I use? How much memory is in use? How much disk space is left? And how much more data can I actually store? Answer This article addresses a few of those questions by giving an overview of the relations between the different values one can read from Exasol's system tables or web management interface (EXAoperation). Overview Have a look at the following diagram showing the general relationships between many of the keywords used when talking about data sizes in Exasol:       Notes: The ratio between   RAW_   and   MEM_OBJECT_SIZE   is what Exasol commonly defines as   compression ratio The distinction between   RAW   and   MEM   sizes also applies to system data: Everything is   compressed in RAM   and mapped to/from disk. Usually there is no need to fit all of the data into DB RAM. Systems with a large amount of /passive data/ may perform well with 10% of   DB_RAM   compared to overall   MEM_OBJECT_SIZE A typical "first guess" estimation for required   DB_RAM   is about 10% of the expected   RAW_OBJECT_SIZE While   deleted blocks   in storage are reused by later operations, as a general rule, storage volumes   do not shrink   unless forced to do so. In   DB_RAM , the ratio between active data and temp data is flexible, but temp is hard-limited at 80%. Information Location Object Level Metric Location Comment RAW_OBJECT_SIZE System table   EXA_DBA_OBJECT_SIZES Theoretical value, never actually required MEM_OBJECT_SIZE System table   EXA_DBA_OBJECT_SIZES   Index size AUXILIARY_SIZE   in System table   EXA_DBA_INDICES   Statistics + Audit Size STATISTICS_SIZE   in System table   EXA_STATISTICS_OBJECT_SIZES   HDD_READ / HDD_WRITE N/A   Database Level Metric Location Comment RAW_OBJECT_SIZE System table   EXA_DB_SIZE_LAST_DAY   MEM_OBJECT_SIZE System table   EXA_DB_SIZE_LAST_DAY   Index Size System table   EXA_DB_SIZE_LAST_DAY   Statistics+Audit Size System table   EXA_DB_SIZE_LAST_DAY   HDD_READ / HDD_WRITE System Table   EXA_MONITOR_LAST_DAY   DB_RAM Size System table   EXA_SYSTEM_EVENTS You can assume that the database will always "use" all that RAM and does not yield to others. RECOMMENDED_DB_RAM_SIZE System table   EXA_DB_SIZE_LAST_DAY   TEMP_DB_RAM_SIZE System Table   EXA_MONITOR_LAST_DAY == (DB_RAM:temp + Temporary:used) DB_RAM:active data N/A   DB_RAM:temp see Node Level below   Persistent:size STORAGE_SIZE   in   EXA_DB_SIZE_LAST_DAY   Persistent:committed see Node Level below ~= ( MEM_OBJECT_SIZE + AUXILIARY_SIZE + STATISTICS_SIZE ) Persistent:deleted see Node Level below size = committed * (100 / USE )   Node Level EXASOL 6.0.0   introduces a new system table for more detailed storage usage:   EXA_VOLUME_USAGE Please refer to the user manual or column comments for details on the columns. Metric Location Comment Persistent:size sum(VOLUME_SIZE) Size is always synchronized across nodes Persistent:committed data sum(COMMIT_DATA)   Temporary:used sum(SWAP_DATA)   Temporary:unused sum(UNUSED_DATA)     Storage Level Metric Location Comment Persistent Volume Size EXAStorage -> (Labels: <dbname>_persistent) -> Size == Persistent:size from Database level Temporary Volume Size EXAStorage -> (Labels: <dbname>_temporary) -> Size   Exists Slave Segment? EXAStorage -> persistent -> Redundancy Gives total number of copies: 1 == no redundancy Free Disk Space EXAStorage -> Space on Disks -> sum(sum:free)   Free database (disk) space See  how-to-monitor-free-database-disk-space      
View full article
Problem JSON data that is stored in EXASOL tables can be accessed through UDFs. JSON documents can be accessed through path expressions. To understand what path expressions are, let us have a look at the following JSON document: { "name" : "Bob" , "age" : 37, "address" :{ "street" : "Example Street 5" , "city" : "Berlin" }, "phone" :[{ "type" : "home" , "number" : "030555555" },{ "type" : "mobile" , "number" : "017777777" }], "email" :[ "bob@example.com" , "bobberlin@example.com" ] } The JSON document contains five fields: “name” (a string), “age” (an integer), “address” (a document), “phone” (an array containing documents) and “email” (an array containing strings). Path expressions start with a dollar sign ($) representing the root document. The dot operator (.) is used to select a field of a document, the star in box brackets ( [*] ) selects and unnests all elements of an array. The following path expression finds all phone numbers: $.phone[*].number This expression is evaluated as follows: path step result $ { "name" : "Bob" , "age" : 37, "address" :{ "street" : "Example Street 5" , "city" : "Berlin" }, "phone" :[{ "type" : "home" , "number" : "030555555" },{ "type" : "mobile" , "number" : "017777777" }], "email" :[ "bob@example.com" , "bobberlin@example.com" ] } $.phone [{ "type" : "home" , "number" : "030555555" },{ "type" : "mobile" , "number" : "017777777" }] $.phone[*] { "type" : "home" , "number" : "030555555" } { "type" : "mobile" , "number" : "017777777" } $.phone[*].number "030555555" "017777777" Solution This solution presents a generic Python UDF json_table to access field values in JSON documents through path expressions. The json_table function has the following form: select json_table( <json string or column >, < path expression>, < path expression>, ... ) emits (< column_name > <data_type>, < column_name > <data_type>, ...) The JSON_TABLE UDF attached to this solution takes a VARCHAR containing JSON data as a first parameter and one or more path expressions: create or replace python scalar script json_table(...) emits(...) as The function can be called in a SELECT query. The EMITS clause has to be used to define the output column names and their data types. SELECT json_table('{ " name " : "Bob" , "age" : 37, "address" :{ "street" : "Example Street 5" , "city" : "Berlin" }, "phone" :[{ " type " : "home" , " number " : "030555555" },{ " type " : "mobile" , " number " : "017777777" }], "email" :[ "bob@example.com" , "bobberlin@example.com" ]} ',' $.phone[*]. number ') EMITS (phone VARCHAR (50)); When the JSON data is stored in a table, the first parameter of JSON_TABLE contains the column name: CREATE TABLE example_table (column_a INT , json_data VARCHAR (2000000)); -- INSERT INTO example_table VALUES (1, '{ " name ": "Bob",…' ); ( as above) SELECT json_table(json_data, '$.phone[*]. number ' ) EMITS (phone VARCHAR (50)) FROM example_table; It is possible to use both the json_table UDF and normal columns of the table within the SELECT clause: SELECT column_a, json_table(json_data, '$.phone[*]. number ' ) EMITS (phone VARCHAR (50)) FROM example_table; When a row in the input table consists of n phone numbers within the JSON column, there will be n output rows for that tuple. The value of column_a is constant for all those rows: COLUMN_A PHONE 1 030555555 1 017777777 The following table shows some more valid path expressions: path result $.name "Bob" $.address { "street" : "Example Street 5" , "city" : "Berlin" } $.address.city "Berlin" $.email [ "bob@example.com" , "bobberlin@example.com" ] $.email[*] "bob@example.com" "bobberlin@example.com" This query converts the JSON data into column values: SELECT json_table(json_data, '$. name ' , '$.age' , '$.address.city' ) EMITS ( name VARCHAR (500), age INT , city VARCHAR (500)) FROM example_table; NAME AGE CITY Bob 37 Berlin When unnesting an array, the values from different levels stay the same for every array element: SELECT json_table(json_data, '$. name ' , '$.age' , '$.address.city' , '$.email[*]' ) EMITS ( name VARCHAR (500), age INT , city VARCHAR (500), email VARCHAR (500)) FROM example_table; NAME AGE CITY EMAIL Bob 37 Berlin bob@example.com Bob 37 Berlin bobberlin@example.com The result of unnesting more than one array is the cross product of those arrays: SELECT json_table(json_data, '$. name ' , '$.age' , '$.address.city' , '$.email[*]' , '$.phone[*]. type ' , '$.phone[*]. number ' ) EMITS ( name VARCHAR (500), age INT , city VARCHAR (500), email VARCHAR (500), phone_type VARCHAR (50), phone_number VARCHAR (50)) FROM example_table; NAME AGE CITY EMAIL PHONE_TYPE PHONE_NUMBER Bob 37 Berlin bob@example.com home 030555555 Bob 37 Berlin bob@example.com mobile 017777777 Bob 37 Berlin bobberlin@example.com home 030555555 Bob 37 Berlin bobberlin@example.com mobile 017777777 Details and limitations: When the JSON input is not a single document but an array, the elements of the array can be accessed via $ [*] It is recommended to use the correct data types in the EMITS clause. Otherwise, casting is done which can lead to type-conversion errors. When accessing non-atomic values, i.e. arrays without unnesting them, or documents, they are returned as a VARCHAR containing the result in JSON format. Accessing multi-dimensional arrays is not supported. This means, at most one [*] can be used in a path expression. Additional References https://docs.exasol.com/advanced_analytics/accessing_json_data_udfs.htm https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/json_value.htm https://community.exasol.com/t5/data-science/parsing-json-data-with-python/ta-p/1247 https://community.exasol.com/t5/discussion-forum/sneakpreview-on-version-7-json-functions/m-p/1710
View full article
Background Geospatial data   can be stored and analyzed in the Exasol database using the GEOMETRY datatype. In this solution, we will show you some examples of how to import geo spatial data from a CSV and from a GeoJSON file and use SQL functions to perform analytics and geo joins. Spacial Reference Systems In the system table SYS.EXA_SPATIAL_REF_SYS, you will find more than 3000 different spatial reference systems which can be used for the GEOMETRY datatype. These reference systems are there to define points on earth, but they have different strengths, accuracies, and properties, for example, the SRID 31466 can only reference locations within Germany. We will use the two SRIDs 4326 and 2163 in our example. 4326 is using degrees as a unit, 2163 uses meters. Often, conversions are necessary between different coordinate systems. The ST_TRANSFORM function can be used for this. Prerequisites Working with Spatial Data in Exasol The following example shows how to create a table with a geometry column, how to insert and query data. We are using GEOMETRY(4326) because the provided coordinates are in degrees. Later, in the SELECT query, we transform the geodata into SRID 2163 to see the distance between the two cities in meters: CREATE TABLE cities( name VARCHAR (200), geo GEOMETRY(4326)); INSERT INTO cities VALUES ( 'Berlin' , 'POINT (13.36963 52.52493)' ); INSERT INTO cities VALUES ( 'London' , 'POINT (-0.1233 51.5309)' ); -- this shows the distance in degrees: SELECT a . name , b. name , st_distance( a .geo, b.geo) FROM cities a , cities b; -- this shows the distance in meters: SELECT a . name , b. name , st_distance(ST_Transform( a .geo, 2163), ST_Transform(b.geo, 2163)) FROM cities a , cities b; GEOMETRY columns can be filled with strings using the well-known text representation (WKT), e.g. 'POINT (13.36963 52.52493)' How to process geodata Step 1: Importing geodata Importing geodata from CSV files Often, geodata is present in CSV files or in colunms of tables that are imported from different database systems in form of latitude and longitude values. On   https://openflights.org/data.html#airport, you can download a CSV file containing international airport data. We are using the extended version of this file, which consists of more than 12,000 international airports and train stations. In the first two fields of the file, there is the id and the name of the airport; in columns 7 and 8, we find its latitude and longitude coordinates: CREATE OR REPLACE TABLE airports( airport_id INT , name VARCHAR (500), latitude DECIMAL (9,6), longitude DECIMAL (9,6) ); IMPORT INTO airports FROM LOCAL CSV FILE 'D:\airports-extended.dat' (1, 2, 7, 8); ALTER TABLE airports ADD COLUMN geo GEOMETRY(4326); UPDATE airports SET geo = 'POINT (' ||longitude|| ' ' ||latitude|| ')' ; SELECT * FROM airports; We firstly used DECIMAL(9,6) columns to store the latitude and longitude values, and then we added a GEOMETRY column to store a geodata point for each airport. Similar to the example above, we can now calculate the distance between two airports or train stations in our table using the ST_DISTANCE and the ST_TRANSFORM function: select st_distance(ST_Transform( a .geo, 2163), ST_Transform(b.geo, 2163)) from airports a , airports b where a . name = 'Berlin-Tegel Airport' and b. name = 'Berlin Hauptbahnhof' ; Importing data from GeoJSON GeoJSON   is an often-used format for storing and exchanging geodata. On   https://geojson-maps.ash.ms, you can download countries as geodata. When you generate a custom JSON file on this site, the file consists of one JSON object, a so-called FeatureCollection, which contains multiple polygons and multi-polygons, one for each country. We import the data in three steps: Loading the full GeoJSON string into a VARCHAR(2000000) column in a table. (mind that this does not work if the GeoJSON is larger than two million characters) Parsing the GeoJSON using the json_table UDF script (see querying-and-converting-json-data-with-the-json-table-udf; this UDF script emits a table with one row for each geo-object) Converting each GeoJSON object into a GEOMETRY value using the ST_geomFromGeoJSON UDF script (attached to this solution) After these steps, we have all countries from the GeoJSON file in a GEOMETRY column. An alternative approach is developing a UDF script that loads the GeoJSON from a server (e.g., by using the Python package requests or paramiko) and parsing and iterating over the GeoJSON objects within the script. For each object, a WKT string is emitted, which can later be converted into a GEOMETRY value. This way, there are no limit on the characters. For our approach, the whole GeoJSON must not be larger than 2,000,000 characters: create or replace table geo_import(v varchar (2000000)); import into geo_import from local csv file 'D:\custom.geo.json' column separator = '0x01' column delimiter = '0x02' ; -- dummy separaters / delimiters to import a whole line as one column value -- json_table (can be found in https://community.exasol.com/t5/database-features/querying-and-converting-json-data-with-the-json-table-udf/ta-p/1800) emits a row for each country with two columns name and geojson create or replace view geojson as select json_table(v, '$.features[*].properties. name ' , '$.features[*].geometry' ) emits ( name varchar (2000000), geojson varchar (2000000)) from geo_import; -- ST_GeomFromGeoJSON is attached to https://community.exasol.com/t5/database-features/how-to-import-the-whole-world-into-exasol-geospatial-data-geo/ta-p/1669 create or replace table countries as select name , cast (ST_GeomFromGeoJSON(geojson) as geometry(4326)) as geo from geojson; select * from countries; Step 2: Geo-Joins and Geo-Indexes The following query performs a geo-join between the two tables, countries and airports. It finds all airports that are within the country Italy: select a .* from countries c join airports a on st_contains( c .geo, a .geo) where c . name = 'Italy' ; You can use   profiling   to see that an geo-index is created automatically and that this geo-index is used to execute the geo-join: alter session set profile = ' ON ' ; alter session set query_cache = ' OFF ' ; select a .* from countries c join airports a on st_contains( c .geo, a .geo) where c . name = 'Italy' ; flush statistics ; select * from exa_statistics.exa_user_profile_last_day where session_id = current_session and command_name = ' SELECT ' preferring high stmt_id; Like other indexes in Exasol, the geo-index is persisted so that it can be used for future queries, it is maintained automatically when the table data changes, and it is automatically dropped when it is not used for five weeks or longer. Mind that geo-indexes are a new feature since Exasol 6.1. Step 3: Visualizing Geometry Data The SQL client DBeaver is able to show geospatial data on a map. On a result tab, right click on a GEOMETRY column, choose View / Format, View as Geometry. Then click on the Panels button on the right to show the Value panel with the map.        
View full article
This article explains query performance using profiling.
View full article
This article is about  using the IMPORT command over JDBC to connect to your Exasol database and wrap the LUA-scripts you want to execute in parallel in the STATEMENT-Clause.
View full article
Background By default, an Exasol virtual machine (VM) is configured to use a host-only network. This configuration allows to access the database and the EXAoperation management interface locally from the host machine. Nevertheless, this configuration prevents the use of publically available hosts and services on the internet from the virtual machine. This How-To provides information about configuring Exasol to be able to access the internet and enables users to: use DNS to access publicly reachable servers for making backups or in database IMPORT/EXPORT statements use LDAP servers of your choice for database or EXAoperation accounts use standard repositories for the installation of UDF packages use the Exasol Version Check (only if this feature has not been disabled) Prerequisites If not already done, import the Exasol OVA file into the VM tool of your choice (in Virtualbox: File -> Import Appliance). Accept the "End User License Agreement" and the "Privacy Statement" and wait until the image has been successfully loaded. How to enable internet access for Exasol Community Edition Configuration of VM network Step 1 Now, a new VM has been created. Change its network settings to use NAT (in Virtualbox: right click on VM -> Settings -> Network -> Adapter 1 -> Attached to NAT ) Step 2 Define port forwardings to guest ports 8563 (database) and 443 (EXAoperation management interface) (in Virtualbox: Adapter 1 -> Port Forwarding -> Add rule -> Host Port 8563 and Guest Port 8563 -> Add rule -> Host Port 4443 and Guest Port 443). The port forwarding rules will enable you to use the VM from the physical host machine. Step 3 Start the VM and wait until Exasol is up and running.  Configuration of DNS server Step 1 Browse to EXAoperation and login. Step 2 Go to Network. In the System tab, click on "Edit". Step 3 Add one or two DNS servers reachable from the VM (e.g. "8.8.8.8" for an environment that can reach internet DNS servers) and click "Apply". Step 4 Log out. Additional Notes Troubleshooting Firewalls and/or company proxy servers may block or restrict traffic to internet hosts.
View full article
This article talks about analytic functions for windowing with links to additional analytics.
View full article
This article describes what you can do if you cannot contact the LDAP server during user creation
View full article
How to analyze disk I/O and concurrency using the *_last_day table data
View full article
This article includes a script to help you generate DDLs for one table
View full article
This article shows what happens behind the scenes when JOINing on a DECIMAL datatype.
View full article
Background Cause and Effect: Since EXASolution transaction isolation level is SERIALIZABLE and newly created transactions are automatically scheduled after finished transactions, it is possible that WAIT FOR COMMITS occur for pure read transactions (consisting of SELECT statements, only).  Explanation How to reproduce: Three different connections (having AUTOCOMMIT off) are needed to reproduce this situation: Example 1: If a long running transaction (Tr1) reads object A and writes object B (e.g. long running IMPORT statements) and a second transaction (Tr2) writes object A and commits in parallel, Tr2 is scheduled after Tr1. AfterTr2 is commited all new transactions are scheduled after it. If such a transaction wants to read object B it has to wait for the commit of Tr1. Transaction 1 Transaction 2 Transaction 3 Comment select * from tab1;       insert into tab2 values 1;       – transaction remains opened         insert into WFC.tab1 values 1;   Transaction 1 < Transaction 2   commit;         commit; Starts a new transaction (Transaction 2 < Transaction 3)     select * from tab2; This statement ends up in WAIT FOR COMMIT , waiting for Transaction 1   Example 2: The same situation may occur if you query system tables while SqlLogServer is performing one of its tasks (e.g. "DB size task" determining the database size). The following example describes this situation: Transaction 1 LogServer Transaction 3 Comment select * from EXA_DB_SIZE_LAST_DAY;       insert into tab1 values 1;       – transaction remains opened         – DB size task (writes EXA_DB_SIZE_LAST_DAY)   Transaction 1 < LogServer transaction, the task is executed every 30 minutes (0:00, 0:30, 1:00, 1:30, ...)     commit; Starts a new transaction (LogServer transaction 2 < Transaction 3)     select * from EXA_DB_SIZE_LAST_DAY; This statement end up in WAIT FOR COMMIT Solution Currently, the only solution to this is to break up Transaction 1 into multiple transactions by performing a COMMIT or ROLLBACK after the initial read access. However, things may get more complicated when the read/write operation is concentrated within a single statement (ie. MERGE or INSERT from SELECT). In the latter case it has proven helpful to 'outsource' the reading part by using IMPORT as a subselect to fetch required data through a separate transaction... Additional References https://community.exasol.com/t5/database-features/transaction-system/ta-p/1522 https://community.exasol.com/t5/database-features/filter-on-system-tables-and-transaction-conflicts/ta-p/1232
View full article
This article addresses ODBC and special characters.
View full article