exa-IDEAS
Community Manager
Community Manager

Executive summary

User-Defined-Functions (UDF) are the core vehicle for the 'analytics' in EXASOLs positioning as 'analytics database'.
A couple of improvements to improve UDFs for large-scale professional use around the following topics is suggested:

  • UDF Resourcemanagement
  • UDF Grants
  • UDF Lineage and Impact
  • UDF Documentation

Problems and Suggestions

UDF Resourcemanagement

*Problem*: For EXASOL UDFs are blackboxes. This implies that EXASOL does not know how much RAM or how many parallel CPU cores a specific UDF needs and utilizes when EXASOL makes a decision how many virtual machines to start with how much resources.
*Suggestion*: allow optional syntax in the UDF header where the UDF author defines RAM size and number of CPU cores.

UDF Grants

*Problem: When granting a UDF to a Role, EXASOL does not provide the necessary grants to that role. Instead EXASOL asks the DBA to provide *all necessary grants to that role. There are two problems for productive use

  • It is unmanageable. How should the DBA know and keep track of the required grants?
  • It is a security risk. Think of UDFs wrapping encryption and decryption: it only works if the secret keys are made available to the user!

I understand that EXASOL does not automatically know which grants are needed in a black-box UDF.

*Suggestion: allow optional syntax in the UDF header where the UDF author defines all required grants. EXASOL could then give these grants to the session *only during executing the UDF and only if the owner of the UDF has those grants, the latter could be checked already at UDF-compile-time. This model is safe, because the executing session has no access to those grants other than executing the UDF and cannot manipulate the UDF itself.

UDF Lineage and Impact

*Problem*: UDFs provide an obstacle to impact and lineage analyses. Which UDFs break if we change a table upstream that the UDF requires?

*Suggestion*: allow optional syntax in the UDF to name all dependencies of the UDF and put those into the dictionary. This would automatically resolve 99% of the grant problem as well, because dependency on a Table or View usually implies the need for a SELECT grant, and dependency on a UDF implies EXECUTE grant.

UDF Documentation

It is desirable to maintain documentation of the UDF together with the UDF code following the ideas of Doxygen or asciidoc, however, EXASOL only allows one comment on the UDF as a whole, it does not allow commenting on INPUT parameters, RETURN and EMIT values. I developed a simple UDF that could parse INPUT, RETURN and EMIT comments from the UDF header, however, EXASOL unfortunately removes any comments from the UDF header. My current workaround is to duplicate the head-definition into the body of the UDF, where the comments survive. Example follows.

*Suggestion*: keep header formatting and particularly comments, or go further, and provide explicit syntax for commenting INPUT parameters, RETURN and EMIT values.

Code example for UDF Documentation

Detail Parser

unfortunately with redundant header comments

 CREATE OR REPLACE LUA SCALAR SCRIPT 
 details
 (
 X VARCHAR(40000) – comma-separated parameter-list with SQL-name, SQL-type and SQL-comment
 ) 
 EMITS (
 detail_class VARCHAR(9) – class of parameter

{PARAMETER|RETURNS|EMITS}

, detail_position INTEGERposition of detail
 , detail_name VARCHAR(32) – normalized unquoted SQL-name 
 , detail_type VARCHAR(400) – SQL-type (without length and precision) 
 , detail_comment VARCHAR(4000) – identified comment excluding comment-char and excluding leading/trailing whitespace
 )
 AS 
 /* begin docu –
 (
 X VARCHAR(40000) – comma-seperated parameter-list with SQL-name, SQL-type and SQL-comment
 ) 
 EMITS (
 detail_class VARCHAR(9) – class of parameter \{PARAMETER|RETURNS|EMITS}

, detail_position INTEGERposition of detail
 , detail_name VARCHAR(32) – normalized unquoted SQL-name 
 , detail_type VARCHAR(400) – SQL-type (without length and precision) 
 , detail_comment VARCHAR(4000) – identified comment excluding comment-char and excluding leading/trailing whitespace
 )
 – end docu */
 function run(ctx)
 start_pattern = '/[*]+ begin docu [-]+' – was (
 stop_pattern = '[-]+ end docu [*]+/' – was IS or AS
 X = ctx.X /** access input variable */
 T = sqlparsing.tokenize(X)
 istart = #T+1
 for i=1,#T do
 if string.match(T[i], start_pattern) ~= nil then
 X = T[i]
 X = string.gsub(X, '.*'..start_pattern, '')
 X = string.gsub(X, stop_pattern..'.*', '')
 T = sqlparsing.tokenize(X)
 istart = 1

class = 'ARGUMENT'
 pos = 1
 name = ''
 type = ''
 comm = ''
 inbrackets = false
 for i=1,#T do
 t = T[i]
 if t == ',' then
 if inbrackets then
 type = type .. t
 else
 ctx.emit(class, decimal(pos), name, type, comm)
 pos = pos + 1 
 name = ''
 type = ''
 comm = ''
 end
 elseif string.upper(t) == 'EMITS' then
 if (name ~= '') then
 ctx.emit(class, decimal(pos), name, type, comm)
 end
 class = 'EMITS'
 pos = 1
 name = ''
 type = ''
 comm = ''
 elseif string.upper(t) == 'RETURNS' or string.upper(t) == 'RETURN' then
 if name ~= '' then
 ctx.emit(class, decimal(pos), name, type, comm)
 end
 class = 'RETURNS'
 pos = 1
 name = ''
 type = ''
 comm = ''
 else
 if sqlparsing.iscomment(t) then
 --[[ comm = string.gsub(string.gsub(t, '/[*]+%s*', ''), '%s*[*]+/', '') ]]
 comm = comm .. string.gsub(t, '-+%s*', '')
 elseif sqlparsing.isidentifier(t) then
 name = string.gsub(sqlparsing.normalize(t), '"', '')
 elseif t == '(' then 
 if type ~= '' then
 inbrackets = true
 type = type .. t
 end
 elseif t == ')' then 
 if inbrackets then
 inbrackets = false
 type = type .. t
 end
 elseif sqlparsing.iskeyword(t) then
 type = type .. ' ' .. sqlparsing.normalize(t)
 elseif sqlparsing.isnumericliteral(t) then
 type = type .. t
 end
 end
 end
 if istart <= #T then
 ctx.emit(class, decimal(pos), name, type, comm)
 end
 break;
 end
 end
 end
 /
 COMMENT ON SCRIPT details IS 'Parses INPUT parameter and RETURN and EMIT values of a SCRIPT oder FUNCTION definition';

Views

here a view to show all details of all objects

 CREATE OR REPLACE VIEW docu_details
 (
 OWNER COMMENT IS 'Owner of Object'
 , SCHEMA_NAME COMMENT IS 'Schema of Object'
 , OBJECT_TYPE COMMENT IS 'Type of Object

{FUNCTION|SCRIPTING|TABLE|VIEW|UDF}

'
 , OBJECT_LANGUAGE COMMENT IS 'Language of Object

{JAVA|LUA|PYTHON|R|SQL}

'
 , OBJECT_INPUT_TYPE COMMENT IS 'Inputtype of Object

{ |SCALAR|SET}

'
 , OBJECT_RESULT_TYPE COMMENT IS 'Resulttyp of Object

{EMITS|RETURNS|ROWCOUNT|TABLE}

'
 , OBJECT_NAME COMMENT IS 'Name of Object'
 , detail_CLASS COMMENT IS 'Class of Detail

{ARGUMENT|COLUMN|RETURNS|EMITS}

'
 , detail_POSITION COMMENT IS 'Position of Detail'
 , detail_NAME COMMENT IS 'Name of Detail'
 , detail_TYPE COMMENT IS 'Datatyp of Detail'
 , detail_COMMENT COMMENT IS 'Comment on Detail'
 )
 AS
 select 
 "OWNER"
 , SCHEMA_NAME
 , OBJECT_TYPE
 , OBJECT_LANGUAGE
 , OBJECT_INPUT_TYPE
 , OBJECT_RESULT_TYPE
 , OBJECT_NAME
 , details(OBJECT_TEXT)
 FROM (
 SELECT 
 SCRIPT_OWNER AS "OWNER"
 , SCRIPT_SCHEMA AS SCHEMA_NAME
 , SCRIPT_TYPE AS OBJECT_TYPE
 , SCRIPT_LANGUAGE AS OBJECT_LANGUAGE
 , SCRIPT_INPUT_TYPE AS OBJECT_INPUT_TYPE
 , SCRIPT_RESULT_TYPE AS OBJECT_RESULT_TYPE 
 , SCRIPT_NAME AS OBJECT_NAME
 , SCRIPT_TEXT AS OBJECT_TEXT
 FROM SYS.EXA_DBA_SCRIPTS
 UNION ALL
 SELECT 
 FUNCTION_OWNER AS OBJECT_OWNER
 , FUNCTION_SCHEMA AS OBJECT_SCHEMA
 , 'FUNCTION' AS OBJECT_TYPE
 , 'SQL' AS OBJECT_LANGUAGE
 , 'SCALAR' AS OBJECT_INPUT_TYPE
 , 'RETURNS' AS OBJECT_RESULT_TYPE
 , FUNCTION_NAME AS OBJECT_NAME
 , FUNCTION_TEXT AS OBJECT_TEXT
 FROM SYS.EXA_DBA_FUNCTIONS
 )
 UNION ALL
 SELECT
 COLUMN_OWNER as "OWNER"
 , COLUMN_SCHEMA as SCHEMA_NAME
 , COLUMN_OBJECT_TYPE as OBJECT_TYPE
 , 'SQL' AS OBJECT_LANGUAGE
 , '' AS OBJECT_INPUT_TYPE
 , '' AS OBJECT_RESULT_TYPE
 , COLUMN_TABLE as object_NAME
 , 'COLUMN' as detail_CLASS
 , COLUMN_ORDINAL_POSITION as detail_POSITION
 , COLUMN_NAME as detail_NAME
 , COLUMN_TYPE as detail_TYPE
 , COLUMN_COMMENT as detail_COMMENT
 FROM
 SYS.EXA_DBA_COLUMNS
 ORDER BY 1,2,3,4,5,6,7
 COMMENT IS 'List of all Object-Details including Comment (see also DOCU_OBJECTS)'
 ;

and here a view to show all objects

 CREATE OR REPLACE VIEW docu_objects (
 OWNER COMMENT IS 'Name of owner'
 , schema_name COMMENT IS 'Name of Schema (or ROLES oder CONNECTIONS)'
 , OBJECT_TYPE COMMENT IS 'Typ of Object'
 , OBJECT_NAME COMMENT IS 'Name of Object'
 , OBJECT_comment COMMENT IS 'Comment on Object'
 )
 AS 
 SELECT OWNER
 , schema_name
 , OBJECT_TYPE
 , OBJECT_NAME
 , OBJECT_comment
 FROM (
 SELECT OWNER
 , root_name AS schema_name
 , OBJECT_NAME, OBJECT_TYPE
 , OBJECT_comment
 FROM EXA_DBA_OBJECTS
 WHERE ROOT_TYPE = 'SCHEMA'
 UNION ALL
 SELECT 'DBA' AS OWNER
 , 'ROLES' as schema_name
 , role_name as OBJECT_NAME
 , 'ROLE' OBJECT_TYPE
 , role_comment as OBJECT_comment
 FROM EXA_DBA_ROLES
 UNION ALL
 SELECT 'DBA' AS OWNER
 , index_schema as schema_name
 , index_table as object_name
 , 'INDEX' as OBJECT_TYPE
 , remarks as OBJECT_comment
 FROM EXA_DBA_INDICES
 UNION ALL
 SELECT 'DBA' AS OWNER
 , 'CONNECTIONS' as schema_name
 , connection_name as object_name
 , 'CONNECTION' as OBJECT_TYPE
 , connection_comment as OBJECT_comment
 FROM EXA_DBA_CONNECTIONS
 )
 COMMENT IS 'List of all Objects including Comment (see also DOCU_DETAILS)'
 ;

Example usage

 SELECT * FROM docu_details where object_name in ('DETAILS','DOCU_DETAILS','DOCU_OBJECTS');
 SELECT * FROM docu_objects where object_name in ('DETAILS','DOCU_DETAILS','DOCU_OBJECTS');

 

Migrated from IDEA-509 by @joehl 

https://www.exasol.com/support/browse/IDEA-509

 

4 Comments
Status changed to: Migrated
exa-IDEAS
Community Manager
Community Manager
 
exa-IDEAS
Community Manager
Community Manager

Comment by @exa-StefanR 

My two cents:

Lineage/Impact

Proper lineage for UDFs would require a lineage analyzer for every programming Language we support, including all the custom packages that could be uploaded by our users. Unlikely to happen.

As UDFs (opposed to procedure scripts) are not able to access any local database objects in the first place, impact analysis is easy: none.

Grants

A possibly simpler but more powerful option would be to allow scripts to be executed in the owner's security context, much like views.
Of course, to not break existing implementations this would have to be a new option for either the GRANT or the CREATE SCRIPT statement.

Documentation

I would already be happy if DESCRIBE would present me with a list of input/output columns and their names

exa-IDEAS
Community Manager
Community Manager

Comment by: @joehl 

Thanks Stefan for commenting, let me clarify:

Lineage/Impact

When dependencies are coded by the programmer in the UDF-header lineage would not require a lineage analyzer (for no programming Language).

On top of the formal input parameters, SQL functions and UDFs can have further dependencies on database objects (I am not talking about further internal dependencies such as containerized libraries). On top of the formal input parameters, SQL functions officially can read variables from configuration tables and hence create dependencies on those (e.g. secret keys). On top of the formal input parameters from one table, UDFs can and must read with JDBC-connections from further tables in the same DB in order to perform certain tasks which require multi-table-input (only writing via JDBC into the same DB would be insane). Typical example for the latter is using UDFs for algorithms that purposefully avoid processing all combinations of the input rows from multiple tables, such as cross-nearest-neighbor algorithms.

Grants

I sympathize with simple solutions, if this solves the problem: go ahead.

Documentation

I would already be happy if EXASOL would not remove my formatting and comments from the UDF-header, such that I can implement my own documentation framework.

Best

Status changed to: Comments Requested
exa-Tobias
Team Exasol
Team Exasol

@joehl Would you mind if we split this idea into 4 ideas following your executive summary?