(Exasol beginner warning).
Hi, I'm trying to evaluate Exasol to see if it would be a better solution to some of our issues than our existing database engine. The problem I have is that I have some SQL queries that take forever to execute on Exasol (while the other engine solves that query on the same data in just a few seconds). To get an idea what is happening I would like to have some insight in what the plan for the query is, so that I can see what can be done to let it complete timely.
I tried to use "alter session set profiling = 'on'" but I get no information there on that query. I of course have to kill the query because it does not complete at all (btw: none of the JDBC query's standard methods to kill a query work; cancel() is implemented completely wrong, and setQueryTimeout does not seem to do anything at all).
I dumped the session's logs from the server and looked at the SqlSession log for the session, but it does not show any relevant information either: it just shows the incoming packet with the SQL statement and then the kill:
28.09 14:19:33.742 ^[[31;1mWARNING ^[[0mReceived execution abort request from client.
28.09 14:19:33.743 *** ABORT EXECUTION *** - reason: USER_ABORT, stopProcess: FALSE, deleteSession: FALSE, sesID: 1679086334003052544, stmtID: 83, message:
28.09 14:19:33.743 abort_and_wait: reason: USER_ABORT, state: RUNNING, abort_ses: 1679086334003052544, abort_stmt: 83, top_stmt: 83, cur_stmt: 83, message: , timeout: 5000
28.09 14:19:38.744 abort_and_wait: state: KILLED, abort_ses: 1679086334003052544, abort_stmt: 83, top_stmt: 83, cur_stmt: 83, message: , returned: QAR_TIMEOUT
28.09 14:19:38.744 [Statistics - Client] Emergency flush of current statement statistics for reason:  Successfully reconnected after user abort, transaction was rolled back.
28.09 14:19:38.746 Determined query resources for session_id=1679086334003052544, stmt_id=83, command_id=3, nesting=0, duration=82.200 sec, rows=, res=0%, error_code=40010, cpu=6.3%, mem=23 MB, temp=0 MB, pers=0 MB, hdd_in=0.0 MB/s, hdd_out=0.0 MB/s, cache_in=0.0 MB/s, cache_in_time=0 sec, cache_out=0.0 MB/s, cache_out_time=0 sec, remote_in=0.0 MB/s, remote_in_time=0 sec, remote_out=0.0 MB/s, remote_out_time=0 sec, net=0.0 MB/s) page(2534+0) io(24) switch(1637+16)
*** Statement could not be stopped nicely, killing now! ***
after which there's a lot of what seem to be stack traces.
Is there any way that I can get any indication what the server is trying to do?
Same problem btw when the server reports "out of memory" on a query..
Thanks for your time!
how do you run the querys against the system ? Since you mentioned JDBC-implementation it does not sound like a "simple client" , or is it ?
Regarding the "alter session set profiling = 'on'" : a subsequent select * from EXA_DBA_PROFILE_LAST_DAY; does turn up empty ? That would actually be strange if you set both alter session and issue the query in the same session....is that the case or do those two statements perhaps get executed in separate sessions ?
If it´s an eval-system you might want to activate profiling system wide ( alter system set profile = 'on'; ) and see what turns up ( not sure if "profiling" as you wrote is correct syntax, at least under 6.2.x it´s not but I don´t have a 7.x on hand right now 😉 ).
Also, do you have Auditing enabled ? Does your query turn up there ?
I have a Java program that sends the queries to Exasol, using JDBC. In JDBC terms it is a simple client.
I tested a query that hangs using a SQL client. I executed both the alter session AND the query in the same session, and there are results in the table; there is just nothing related to that query's plan. I do have to kill the query after it has ran for a while though as it does not complete (at least not within the 5 hours I let it play).
I do not have auditing enabled. I do see the query when I look in the server's logfile, but that also does not seem to contain any information that looks like a plan..
Curious....I think what Nico and Uwe suggested is your best course of action for now - maybe check the exa_dba_transaction_conflicts for anything suspicious, 5 hours sure seems like a lot of time...
Next time you try it, I would suggest doing the following. The last query will show you your execution plan for the entire session.
SELECT CURRENT_SESSION; ALTER SESSION SET PROFILE='ON'; <run the long query> flush statistics; SELECT * FROM EXA_DBA_PROFILE_LAST_DAY WHERE SESSION_ID = CURRENT_SESSION;
If you want to look while the query is still running, then copy the session ID that you got above, and look in EXA_DBA_PROFILE_RUNNING:
FLUSH STATISTICS; SELECT * FROM EXA_DBA_PROFILE_RUNNING;
If you want, you can export this table as a CSV and attach it here so that we can help figure out why it's taking so long 😊
FLUSH STATISTICS; EXPORT (SELECT * FROM EXA_DBA_PROFILE_RUNNING) INTO LOCAL CSV FILE '<path/to/file/profile_running.csv') WITH COLUMN NAMES;
Hi, @fjalvingh thanks for your question.
Could you let us know a bit more about your environment? Set up of the system, Datavolume and may be the task at hand. E.g. Analysing Data in Tableau, DataScience.
This will make it easier for the participants to answer your question.
Looking forward to help you
Sure. I am using the "free" or "demo" Exasol VM image running locally on my developer machine in Virtualbox. My machine is an Ubuntu 20.04 machine with fast SSD, Threadripper 3960x and 64GB of memory. The VM has been assigned 16GB of memory and 16 processors. I have not made any change to Exasol's configuration, and I'm just using the default account (sys/exasol).
I have created a schema with some 2000 tables in there. The tables have the format of a datavault, consisting of hubs, satellites and links. The database contains about 8GB of data (as far as input data is concerned), the biggest tables have around 2 million records.
The SQL that is executed on that data is quite complex and is generated by the application. All SQL is "report like" SQL, meaning that it calculates results from all records in the tables it joins. There are often very many joins (10..30) in that SQL (it joins satellites and link records, and filters on aspects of the results).
The product that uses all of this has been in production for a few years and currently uses Postgres as a database. That database is able to execute the test workload on my machine in 3:40 (three minutes 40 seconds). The goal is to get that faster or at least as fast. If I have the idea that this can be done we can do more to use other Exasol functionality..
Hope this gives a better picture?