When to use which programming language in Exasol? This article helps you to find the right language for your task.
View full article
Problem JSON data that is stored in EXASOL tables can be accessed through UDFs. JSON documents can be accessed through path expressions. To understand what path expressions are, let us have a look at the following JSON document: { "name" : "Bob" , "age" : 37, "address" :{ "street" : "Example Street 5" , "city" : "Berlin" }, "phone" :[{ "type" : "home" , "number" : "030555555" },{ "type" : "mobile" , "number" : "017777777" }], "email" :[ "bob@example.com" , "bobberlin@example.com" ] } The JSON document contains five fields: “name” (a string), “age” (an integer), “address” (a document), “phone” (an array containing documents) and “email” (an array containing strings). Path expressions start with a dollar sign ($) representing the root document. The dot operator (.) is used to select a field of a document, the star in box brackets ( [*] ) selects and unnests all elements of an array. The following path expression finds all phone numbers: $.phone[*].number This expression is evaluated as follows: path step result $ { "name" : "Bob" , "age" : 37, "address" :{ "street" : "Example Street 5" , "city" : "Berlin" }, "phone" :[{ "type" : "home" , "number" : "030555555" },{ "type" : "mobile" , "number" : "017777777" }], "email" :[ "bob@example.com" , "bobberlin@example.com" ] } $.phone [{ "type" : "home" , "number" : "030555555" },{ "type" : "mobile" , "number" : "017777777" }] $.phone[*] { "type" : "home" , "number" : "030555555" } { "type" : "mobile" , "number" : "017777777" } $.phone[*].number "030555555" "017777777" Solution This solution presents a generic Python UDF json_table to access field values in JSON documents through path expressions. The json_table function has the following form: select json_table( <json string or column >, < path expression>, < path expression>, ... ) emits (< column_name > <data_type>, < column_name > <data_type>, ...) The JSON_TABLE UDF attached to this solution takes a VARCHAR containing JSON data as a first parameter and one or more path expressions: create or replace python scalar script json_table(...) emits(...) as The function can be called in a SELECT query. The EMITS clause has to be used to define the output column names and their data types. SELECT json_table('{ " name " : "Bob" , "age" : 37, "address" :{ "street" : "Example Street 5" , "city" : "Berlin" }, "phone" :[{ " type " : "home" , " number " : "030555555" },{ " type " : "mobile" , " number " : "017777777" }], "email" :[ "bob@example.com" , "bobberlin@example.com" ]} ',' $.phone[*]. number ') EMITS (phone VARCHAR (50)); When the JSON data is stored in a table, the first parameter of JSON_TABLE contains the column name: CREATE TABLE example_table (column_a INT , json_data VARCHAR (2000000)); -- INSERT INTO example_table VALUES (1, '{ " name ": "Bob",…' ); ( as above) SELECT json_table(json_data, '$.phone[*]. number ' ) EMITS (phone VARCHAR (50)) FROM example_table; It is possible to use both the json_table UDF and normal columns of the table within the SELECT clause: SELECT column_a, json_table(json_data, '$.phone[*]. number ' ) EMITS (phone VARCHAR (50)) FROM example_table; When a row in the input table consists of n phone numbers within the JSON column, there will be n output rows for that tuple. The value of column_a is constant for all those rows: COLUMN_A PHONE 1 030555555 1 017777777 The following table shows some more valid path expressions: path result $.name "Bob" $.address { "street" : "Example Street 5" , "city" : "Berlin" } $.address.city "Berlin" $.email [ "bob@example.com" , "bobberlin@example.com" ] $.email[*] "bob@example.com" "bobberlin@example.com" This query converts the JSON data into column values: SELECT json_table(json_data, '$. name ' , '$.age' , '$.address.city' ) EMITS ( name VARCHAR (500), age INT , city VARCHAR (500)) FROM example_table; NAME AGE CITY Bob 37 Berlin When unnesting an array, the values from different levels stay the same for every array element: SELECT json_table(json_data, '$. name ' , '$.age' , '$.address.city' , '$.email[*]' ) EMITS ( name VARCHAR (500), age INT , city VARCHAR (500), email VARCHAR (500)) FROM example_table; NAME AGE CITY EMAIL Bob 37 Berlin bob@example.com Bob 37 Berlin bobberlin@example.com The result of unnesting more than one array is the cross product of those arrays: SELECT json_table(json_data, '$. name ' , '$.age' , '$.address.city' , '$.email[*]' , '$.phone[*]. type ' , '$.phone[*]. number ' ) EMITS ( name VARCHAR (500), age INT , city VARCHAR (500), email VARCHAR (500), phone_type VARCHAR (50), phone_number VARCHAR (50)) FROM example_table; NAME AGE CITY EMAIL PHONE_TYPE PHONE_NUMBER Bob 37 Berlin bob@example.com home 030555555 Bob 37 Berlin bob@example.com mobile 017777777 Bob 37 Berlin bobberlin@example.com home 030555555 Bob 37 Berlin bobberlin@example.com mobile 017777777 Details and limitations: When the JSON input is not a single document but an array, the elements of the array can be accessed via $ [*] It is recommended to use the correct data types in the EMITS clause. Otherwise, casting is done which can lead to type-conversion errors. When accessing non-atomic values, i.e. arrays without unnesting them, or documents, they are returned as a VARCHAR containing the result in JSON format. Accessing multi-dimensional arrays is not supported. This means, at most one [*] can be used in a path expression. Additional References https://docs.exasol.com/advanced_analytics/accessing_json_data_udfs.htm https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/json_value.htm https://community.exasol.com/t5/data-science/parsing-json-data-with-python/ta-p/1247 https://community.exasol.com/t5/discussion-forum/sneakpreview-on-version-7-json-functions/m-p/1710
View full article
This article shows you how you can create functions to work with the Bing Maps Tile System
View full article
Background Below is an UDF script in R that executes a scoring model inside EXASOL. While example is using the flights dataset from R and applying a randomForest model, it may be used as a template for R models. It has already proven in a real case where a Neural Network-based scoring process on a multi-bilion dataset was transferred from an R Server into EXASOL, reducing the amount of time needed from approx. 1h to a few mins. How to deploy an R scoring model in EXASOL via UDF Typical points for adaptation are: Step 1 the library needed for scoring must be present (here: library(randomForest) Step 2 the column names in the data.frame used in the UDF must resemble those of the data frame used for building the model. Typically these can be extracted from the model, but for other kinds of models the location needs to be adapted: if (length(m) == 0) { ... # get the predictors names colnames <<- names(m$forest$xlevels) } Here the model is loaded from EXABucket (Redis in EXASOL 5) if not already present in the environment. Depending on the model size, the model loading is the main factor influencing the time needed for the scoring process. The character vector 'colnames' is later used to construct the data.frame; it must contain the column names. Step 3 Later when the InDB scoring is started by the SELECT, the columns need to be referenced in the same order as in the model building data frame (due to pt 2). The GROUP BY parameter ROWNUM / 100 000 may be a lever for performance fine tuning: the more rows per UDF, the faster the scoring, but if an instance gets too many rows, it runs out of memory. R (Client side) - download code for: EXASOL 6:   r_client_v6.R     (as shown here) EXASOL 5:   r_client_v5.R r_client_v6.R # build a model for testing # see www.github.com/exasol/r-exasol library(exasol) library(nycflights13) con <- dbConnect( "exa" , dsn= "exa" ) dbWriteTable(con, "R.FLIGHTS" , flights) df <- dbGetQuery(con, "select * from R.FLIGHTS limit 10000" ) dbDisconnect(con) library(randomForest) set.seed(852) df <- df[complete.cases(df),] model1 <- randomForest(as.factor(carrier) ~ dep_time + dep_delay, data=head(df, n=1000)) # like this , you could do prediction on the client-side: #predict(model1, df[,c( "dep_time" , "dep_delay" )]) # but instead, we want to predict in-database, thus we have to # upload the model to an EXABucket via httr library(httr) PUT( # EXABucket URL url = "http: //192.168.42.129:2101/bucket1/model1" , body = serialize(model1, ascii = FALSE, connection = NULL), # EXABucket: authenticate with write_user__name / write_user__password config = authenticate( "w" , "writepw" ) ) EXASOL (Server side) - download code for: EXASOL 6:   flights_v6.sql     (as shown here) EXASOL 5:   flights_v5.sql flights_v6.sql -- deploying the model open schema R; create or replace r set script pred_randomForest(...) emits (predict varchar (200000)) as library(randomForest) m <- list() colnames <- vector( mode = " character " ) run <- function (ctx) { if ( length ( m ) == 0) { # only load the model if not already present. Must be done in run() as ctx is needed. f <- file (ctx[[1]](), open = "rb" , raw = TRUE ) m <<- unserialize(f) # get the predictors names colnames <<- names ( m $forest$xlevels) } # load entire chunk ctx$next_row(NA) n <- exa$meta$input_column_count s <- paste0( "model_key" , "=ctx[[1]]()" ) for (i in 2:n) { # construct a data .frame from the chunk & colnames name <- colnames[i - 1] s <- paste0(s, ", " , name , "=ctx[[" , i, "]]()" ) } eval(parse( text = paste0( "inp_df <- data .frame(" , s, ")" ))) # predict & emit ctx$emit(predict( m , inp_df[, 2:n])) } / create connection sys_bucketfs to 'bucketfs:bfsdefault/bucket1' identified by 'readpw' ; select pred_randomForest( '/buckets/bfsdefault/bucket1/model1' , -- EXABucketFS path "dep_time" , "dep_delay" ) -- predictors from FLIGHTS group by iproc(), -- the node no for data locality cast ( rownum /100000 as int ); -- restrict the no of rows ( limit is R max . vector size ) Additional references: https://community.exasol.com/t5/database-features/how-to-use-exasolution-r-sdk/ta-p/1224 https://community.exasol.com/t5/environment-management/how-to-create-an-exabucketfs-service-and-bucket/ta-p/1515
View full article
This solution describes a method on how to retrieve type information without the need for a syntactically correct and complete CREATE statement.
View full article
Top Contributors