Support IMPORT-FROM-EXA connection string arguments

Use Case:
We have 2 Exasol instances. There are cases where the local instance pulls data from the remote instance via an IMPORT-FROM-EXA statement.
Sometimes the statement that’s executed on the remote instance is querying a virtual schema which generates its pushdown SQL by querying a semantic layer API.

Problem:
IMPORT-FROM-EXA always runs the statement on the remote instance as a prepared statement i.e it makes 2 remote calls: prepare + execute.
This results in 2 separate calls to the virtual schema pushdown SQL generator which results in 2 semantic layer API calls which can slow things down a bit (There is nothing in the virtual schema parameters that indicates whether it’s in the prepare phase or execution phase).

There is a related problem that IMPORT-FROM-EXA issues a ROLLBACK on the remote instance after execution. This means that any auto generated indexes are not saved.

Suggested Solution:
It would be great if an IMPORT-FROM-EXA could accept connection string parameters that control its behaviour on the remote instance in the same way that IMPORT-FROM-JDBC supports parameters.

Then I’d suggest 2 parameters: execute_only and auto_commit
Setting them would cause the remote connection to skip the prepare phase and to issue a COMMIT afterwards.

Thank you for the consideration.
Peter

The IMPORT-FROM-EXA statement currently requires a preparation step to retrieve column metadata prior to execution. To support a single-step (execute-only) import, the Exasol loader would need to be modified to execute the statement immediately: retrieving only the result set metadata (without transferring any data), constructing the transformation pipelines, and then starting the data transfer process, all in one step. Currently, this process is logically divided into two distinct phases:

  • Prepare – retrieve metadata and build transformation pipelines

  • Execute – fetch the data

Separately, introducing a parameter like (remote_)auto_commit would make sense. This would prevent the default rollback behavior and allow auto-created indexes (e.g., resulting from a join in the remote query) to be preserved after import.

Thanks @Ruslan_Rusinov - that makes sense.

Then perhaps instead of the single-step import I would request that you add a flag to the virtual schema pushdown request parameters (or global exa.metadata parameters) indicating whether the current request is in the prepare vs execute phase because (in our use case) there are some expensive steps we can skip if we know it’s just the prepare phase.

Let me see if I understood your setup and challenge correctly:

Database A → ExaLoader (IMPORT) → VS on database B →Target schema on database B

And you would like the VS on B to react differently between two calls?

If my understanding is correct, we would need an SQL extension for the SQL the ExaLoader generates that indicates in the query that this query is special. I am guessing that the “prepare” queries the metadata tables of B via SQL only. An assumption that the loader team would need to confirm.

@peterkioko could you paste and anonymized and shortened VS log of the metadata request maybe?

And the ExaLoader would need to know that it is talking to a VS, not a regular schema.

Another question: the VS that you are talking about that acts as semantic layer is your custom development?

Hi @sebastian_baer,

Yes, your understanding of the setup is correct.

The VS is a custom development. During “pushdown” requests it calls an external semantic layer service API in order to generate the response SQL. We have found that in the case of IMPORT FROM EXA AT ... STATEMENT ‘select * from vs' the API is getting called 3 times. It gets called 2 times if you specify the column list instead of star.

You can see that behaviour via

--/
CREATE OR REPLACE LUA ADAPTER SCRIPT test.vs_adapter AS
require "cjson"
require "socket"
function adapter_call( json_req )
  req = cjson.decode( json_req )
  req.capabilities = {''}
  req.schemaMetadata = { tables = {{ name = 'T', columns = {{ name = 'C', dataType = { type = 'BOOLEAN' }}}}}}
  if req.type == 'pushdown' then
     -- Make API call to external semantic layer service
     req.sql = 'SELECT TRUE'
     log = socket.connect('localhost', '33333')
     log:send("REQUEST: "..cjson.encode(req).."\n\n")
     log:close()
  end
  return cjson.encode( req )
end
/
;

CREATE VIRTUAL SCHEMA test_vs USING test.vs_adapter;
-- nc -klp 33333 on localhost
IMPORT FROM EXA AT exasol_local STATEMENT 'SELECT * FROM test_vs.t';
DROP VIRTUAL SCHEMA IF EXISTS test_vs CASCADE;

So I was thinking that since you already make metadata like exa.meta.statement_id available perhaps you could include an exa.meta.execution_mode which would indicate “PREPARE” or “EXECUTE”. That way we could skip the API call during “PREPARE” mode and simply generate a dummy SQL response that has the correct data types.

Or better yet, since the query engine already knows the expected data types (they are in req.pushdownRequest.selectListDataTypes) it doesn’t seem necessary to call the VS framework with a “pushdown” during the “prepare” execution mode.

Edit: I should also include the log output from above test:

REQUEST: {“pushdownRequest”:{“type”:“select”,“from”:{“name”:“T”,“type”:“table”},“selectListDataTypes”:[{“type”:“BOOLEAN”}]},“capabilities”:[“”],“sql”:“SELECT TRUE”,“involvedTables”:[{“name”:“T”,“columns”:[{“name”:“C”,“dataType”:{“type”:“BOOLEAN”}}]}],“schemaMetadataInfo”:{“name”:“TEST_VS”},“type”:“pushdown”,“schemaMetadata”:{“tables”:[{“name”:“T”,“columns”:[{“name”:“C”,“dataType”:{“type”:“BOOLEAN”}}]}]}}

REQUEST: {“pushdownRequest”:{“type”:“select”,“from”:{“name”:“T”,“type”:“table”},“selectListDataTypes”:[{“type”:“BOOLEAN”}]},“capabilities”:[“”],“sql”:“SELECT TRUE”,“involvedTables”:[{“name”:“T”,“columns”:[{“name”:“C”,“dataType”:{“type”:“BOOLEAN”}}]}],“schemaMetadataInfo”:{“name”:“TEST_VS”},“type”:“pushdown”,“schemaMetadata”:{“tables”:[{“name”:“T”,“columns”:[{“name”:“C”,“dataType”:{“type”:“BOOLEAN”}}]}]}}

REQUEST: {“schemaMetadata”:{“tables”:[{“name”:“T”,“columns”:[{“name”:“C”,“dataType”:{“type”:“BOOLEAN”}}]}]},“sql”:“SELECT TRUE”,“schemaMetadataInfo”:{“name”:“TEST_VS”},“pushdownRequest”:{“from”:{“name”:“T”,“type”:“table”},“type”:“select”,“selectListDataTypes”:[{“type”:“BOOLEAN”}]},“capabilities”:[“”],“type”:“pushdown”,“involvedTables”:[{“name”:“T”,“columns”:[{“dataType”:{“type”:“BOOLEAN”},“name”:“C”}]}]}

As you can see the request params are identical so there is currently no way of knowing whether the “execution mode” is “prepare” or “execute”

Thank you very much for the clarification. The extension you are proposing affects the core database product in that case and not our published Virtual Schemas. That means that a colleague from core database will have to chime in on the feasibility and effort of such a change.

Hi @peterkioko ,
I like the idea and not very complicated.
I will discuss it with product management.
We would probably add something like this:
{{“queryContext”:{“compileMode”:”PREPARE”}}, “schemaMetadata”:{“tables”:[{“ …

Would this solve your problem? :slight_smile:

Hi @georg_dotzler - Yes, that would solve our problem. Thank you!