Parallel connections to SQL Server

danila
Contributor

I want to create a script which will be able to read data from multiple SQL Server instances in parallel.

I already saw this article: https://community.exasol.com/t5/database-features/parallel-connections-with-jdbc/ta-p/1779

But the problem is that I'm not familiar with Java and looks like it uses parallel connections to Exasol, not to other provider.

Does anyone have an example of such a script?

Currently I have only this implementation of non-parallel script and it is very slow.

 

CREATE LUA SCRIPT SANDBOX.TEST () RETURNS TABLE AS
local sSQLServerConnectionName
local success
local res
local result_table = {}
local tConnections = query([[select connection_name from exa_dba_connections]]);

for i = 1, #tConnections, 1 do
	sSQLServerConnectionName = tConnections[i].CONNECTION_NAME

	success, res = pquery([[
	import from jdbc at ::SQLServerConnectionName
	statement 'select @@servername as ServerName'
	]],{SQLServerConnectionName = sSQLServerConnectionName});

	if success 	then
		for n = 1, #res, 1 do
			table.insert (result_table, {res[n][1]})
		end
	end
end

exit (result_table, [[Server_Name varchar(128)]])

 

1 ACCEPTED SOLUTION

exa-StefanR
Team Exasol
Team Exasol

Do the SQL server instances see each other? Maybe you can outsource the connection-multiplexing to one of those servers like

IMPORT FROM JDBC AT <master-sql>
   STATEMENT 'select ... from table@server1'
   STATEMENT 'select ... from table@server2'
   ...

This also works with Exasol as intermediary multiplex server, but we probably have a little more overhead per connection....

View solution in original post

14 REPLIES 14

exa-StefanR
Team Exasol
Team Exasol

For starters, importing mass data into the Lua Engine for further processing is an anti-pattern with Exasol. Usually you should use IMPORT INTO <table_name> and process data using SQL from there on.

However, with our current releases, the only way to achieve a single import from multiple databases would be to create an ETL UDF opening the necessary SQL Server connections; however, that approach will be a direct invitation to the JDBC/Java programming club...

 

Lookout

Exasol already supports using multiple web services for file-based import and (as @mwellbro mentioned) multiple STATEMENT clauses for JDBC-based imports; so maybe creating an Idea to support multiple connection/statement pairs for JDBC could lead to a future implementation...

mwellbro
Xpert

On the note of "however, that approach will be a direct invitation to the JDBC/Java programming club...", @exa-StefanR :

 

CREATE OR REPLACE JAVA SCALAR SCRIPT MY_SCHEMA.CALL_THREADS() EMITS (c1 VARCHAR(2000)) AS 
%jar /buckets/bfsdefault/jdbc/postgresql-42.2.5.jar;

import java.sql.*;

%import MY_SCHEMA.RUNNABLE_THREAD;

class CALL_THREADS {

  static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
        final String[] my_var = new String[3];
        RUNNABLE_THREAD a = new RUNNABLE_THREAD( "Thread-1", my_var , 0);
        a.start();
        //try {
        //    a.t.join();
        //} catch (Exception e) {}

        RUNNABLE_THREAD b = new RUNNABLE_THREAD( "Thread-2", my_var , 1);
        b.start();

        //try {
        //    b.t.join();
        //} catch (Exception e) {  }


        RUNNABLE_THREAD c = new RUNNABLE_THREAD( "Thread-3", my_var , 2);
        c.start();

        try {
            c.t.join();
            b.t.join();
            a.t.join();
        } catch (Exception e) {
        }
 
  ctx.emit(my_var[0]);
  ctx.emit(my_var[1]);
  ctx.emit(my_var[2]);
 return;
 }
}
/

 

 

mwellbro_0-1617924088818.png

 

Didn´t want to fire up 3 different DB´s so just to show how something like this might ( emphasize might ) be done 🙂

Fun well known fact: once your source is clogged you don´t gain any benefits from parallization 😉

mwellbro_1-1617924256595.png

 

Cheers,
Malte

danila
Contributor

I like the idea of creating ETL_UDF script and using it in query like this:

SELECT get_table_from_sql_server (connection_name) 
FROM exa_dba_connections;

Is it possible to use non-scalar function here? Or how can I import data from external source using function like in the example?

mwellbro
Xpert

Hi @danila ,

could you give a bit more details on what you are aiming to do ? Is this supposed to get data from your SQL-Server instances into your Exasol system ?

Do you require these connections to perform different queries in parallel ?

One of the easiest additions to your example would be to use multiple STATEMENT clauses in your IMPORT command and push the data into an Exasol table - I´m not quite sure from
your example what you want to achieve with the pquery that imports into a lua table ( if I read this correctly ).

Or do you think more along the lines of "execution of multiple statements in parallel within a script" ?

Cheers,
Malte

danila
Contributor

The idea is to get information about jobs running on different servers and to combine it in one resultset.

So, the query which is pushed to SQL Server is always the same across all the connections.

I don't want to store the resultsets in Exasol. Just want to get one time resultset "on the fly". That's why I'm working with lua tables, in order not to save data to Exasol tables.

mwellbro
Xpert

If you don´t need to permanently store it you could "inline-IMPORT" it:

 

select * from ( IMPORT FROM JDBC AT sql_server_1_here STATEMENT 'select my_status from my_table' ) union all
select * from ( IMPORT FROM JDBC AT sql_server_2_here STATEMENT 'select my_status from my_table' ) union all
<add-infinitum>  - or at least a rather large number of permissible union all branches 😉

 

danila
Contributor

I want to just see the result. Not planning to join it later.

We are talking about 25 servers.

danila
Contributor

I can do like that. But it won't force query to run in parallel.

mwellbro
Xpert

Urgh, you´re absolutely right, sorry about that - guess that would lead into UDF territory - when I need to go "accross servers" I mostly use parallel executed powershell scripts to connect to different systems ( exaplus in my case ).
How many connections / servers are we talking about - might actually be viable to put those results into a lua table ( even though @exa-StefanR is absolutely right with the anti-pattern remark ).
Do you just want to see the result in a client or do you plan to "work with it" ( i.e. join your result to something else ) ?

exa-StefanR
Team Exasol
Team Exasol

Do the SQL server instances see each other? Maybe you can outsource the connection-multiplexing to one of those servers like

IMPORT FROM JDBC AT <master-sql>
   STATEMENT 'select ... from table@server1'
   STATEMENT 'select ... from table@server2'
   ...

This also works with Exasol as intermediary multiplex server, but we probably have a little more overhead per connection....

View solution in original post

danila
Contributor

SQL Servers may see each other via Linked Server, but currently Linker Servers are not created. And I'd like to leave it as is.

When I try to use Exasol as central server, like in your query:

IMPORT INTO (SN varchar(100))
FROM EXA AT 'jdbc:exa:192.168.0.1..7:8563' USER 'TEST' IDENTIFIED BY 'test123'
STATEMENT 'select SN from (import from jdbc at SQLSRV1_CONN statement ''select @@servername as SN'') t'
STATEMENT 'select SN from (import from jdbc at SQLSRV2_CONN statement ''select @@servername as SN'') t';

I get the error:

Must specify exactly one remote table [line 3, column 11]

With one STATEMENT clause it works without error.

exa-StefanR
Team Exasol
Team Exasol

Import from JDBC, not from EXA...

mwellbro
Xpert

I think you´ll need to use FROM JDBC , not FROM EXA in this case

danila
Contributor

Thank you, guys. Now the query works 3 times faster.