Exasol 6.2.4 Perfomance

SQL-Fighter

Hi guys,

We are having performance issues with our Exasol - query times are extraordinary long and volatile. 
The database has 16 GB RAM and the underlying Server 32 GB RAM and 16 CPUs.

Here you can see basic information provided by the UI indicating that the RAM usage is pretty high:

Moritz_0-1598596375002.jpeg

However, I wonder why this is the case.
Do you have any ideas how to improve perfomance?
Any help would be much appreciated!

Best

Moritz

1 ACCEPTED SOLUTION

Xpert

Hi Simon,

just so I think in the right direction: "have been running smooth a few weeks ago" => also under V.6.2.4 , right ?

"It appears unrealistic, that all SQLs and joins, that have been working before, are now affected by these issues." => I´m not sure what to make of this, I´ll try to be more specific: the things I enumerated ( which were meant to be neither exhaustive nor directly applicable in your setup ) were just examples.

From what I can guess given the original info there are 2 phenomenons here: 

1) Queries that are being considered too slow with exec times <10 sec. ( the example profile was given ), where I´d think it could be worthwhile checking COMPILE times.
2) Queries that suddenly run "open end" or at least 10min. , which can happen for a lot of reasons and without the DDL or SQL code changing ( the properties of the data could have changed in terms of per value distribution, what was previously an index could now be an index wrapper, etc. , etc. ).

When checking your SQLs avoid 2 little pitfalls I went into on occasion:

1) If your measure runtime across a session with multiple statements for your result ( well hello there, MicroStrategy-Multi-Pass-Execution ) it does pay off to concatenate all SQL_TEXTs , regex away all "unique to the execution timey wimey-bits" and create a HASH_TIGER over the resulting VARCHAR, that way you´ll see rather soon if your SQL_TEXT changed along the way ( it´s a bit like a sql_id in oracle ).

2) Make use of the recursive_dependency view in exa to resolve or rather detect changes in sub elements along your views, i.e. SELECT * FROM ACCESSLAYER.FACT_PROJECTS_BUDGET_MONEY_MONTH might be the exact query text for the last couple of weeks, but if the DDL of that view has been changed you could hardly call it the same SQL when analyzing for issues.

Hope this helps, looking forward to what you dig up 🙂

Cheers,
Malte

 

View solution in original post

18 REPLIES 18

Xpert

Hi again,

you could also check "$EXA_PROFILE_LAST_DAY" ( double-quotes are important ), there you´ll find all profiles from the last 24 hours without there CPU/Mem/etc. metrics, but including the runtime for compile and execute in separate rows - by this you could verify if compile has the higher portion of the runtime.

Cheers,
Malte

Xpert

Hi Moritz,

from what I can see in the query.csv export most of the time is spent within COMPILE/EXECUTE - since you said "Server" I assume that we are not dealing with a cluster, so unbalanced execution does not factor into this.

My guess from the schema name "DATAVAULT" is that the view you are selecting does contain a fair amount of joins ? So figuring out the right join order will take some time ( up until a certain limit ) which is what I believe you are hitting here, despite having a rather small volume of data throughout your objects - and since you are "clean" on swap and hdd read/write I don´t believe it´s an issue with the "general system setup".

The cross join ( NLJOIN ) that kristof pointed out is also something that eats about 1/5th of your total runtime, but you are losing more on the COMPILE/EXECUTE so that´s where I´d put my money.

To verify this one would probably have to make an export for this session (on the same day as the execution, after 24hrs the single session info is no longer exportable because it´s getting compacted and you´d have to export the whole day which depending on your system activity isn´t much fun to work through ) from exa_operation and check the details of the execution.

As for the query which runs >10 min. where it actually shouldn´t : when we see something like this it´s usually a result of a wrong join order being generated - fixing those depends heavily on how your data model works ( sometimes you need to "force" a certain join path by putting outer joins in place, sometimes you have to push filters from WHERE clauses into ON or switch filtered columns in order to make them work as local predicates - but, it depends 🙂  ).

Hope this helps - did you get in touch with the exa support guys ?

Cheers,
Malte

Padawan

Hey Malte,

thank you very much for your advice! I´m a colleague of Moritz and we are working together on this problem. 

I guess the amount of joins and select statements could be a reason for those performance issues and we will check that! However, all the scripts have been running smooth a few weeks ago. I´m pretty sure that the SQL-Statements have not changed during this time. 

Furthermore, this table is not the only one that is effected by these performance issues. Even other Views of our Fact- and Dimensiontables are effected with longer loadtimes (about 2-6 seconds). It appears unrealistic, that all SQLs and joins, that have been working before, are now affected by these issues.

 

We are going to check the recent SQLs and try to find out, wether this is the reason for our issues.

 

Thank you all so much for your help!

Any further ideas are much appreciated! 🙂

 

Best,

Simon

 

Xpert

Hi Simon,

just so I think in the right direction: "have been running smooth a few weeks ago" => also under V.6.2.4 , right ?

"It appears unrealistic, that all SQLs and joins, that have been working before, are now affected by these issues." => I´m not sure what to make of this, I´ll try to be more specific: the things I enumerated ( which were meant to be neither exhaustive nor directly applicable in your setup ) were just examples.

From what I can guess given the original info there are 2 phenomenons here: 

1) Queries that are being considered too slow with exec times <10 sec. ( the example profile was given ), where I´d think it could be worthwhile checking COMPILE times.
2) Queries that suddenly run "open end" or at least 10min. , which can happen for a lot of reasons and without the DDL or SQL code changing ( the properties of the data could have changed in terms of per value distribution, what was previously an index could now be an index wrapper, etc. , etc. ).

When checking your SQLs avoid 2 little pitfalls I went into on occasion:

1) If your measure runtime across a session with multiple statements for your result ( well hello there, MicroStrategy-Multi-Pass-Execution ) it does pay off to concatenate all SQL_TEXTs , regex away all "unique to the execution timey wimey-bits" and create a HASH_TIGER over the resulting VARCHAR, that way you´ll see rather soon if your SQL_TEXT changed along the way ( it´s a bit like a sql_id in oracle ).

2) Make use of the recursive_dependency view in exa to resolve or rather detect changes in sub elements along your views, i.e. SELECT * FROM ACCESSLAYER.FACT_PROJECTS_BUDGET_MONEY_MONTH might be the exact query text for the last couple of weeks, but if the DDL of that view has been changed you could hardly call it the same SQL when analyzing for issues.

Hope this helps, looking forward to what you dig up 🙂

Cheers,
Malte

 

View solution in original post

SQL-Fighter

Hi mwellbro,

Basically, the SQL was too time consuming and we solved the problem by persisting some code in our database.

We are also improving our codes to further accelarate speed.

Thank you very much and best regards

Moritz

Padawan

Hey Malte,

so we have restarted and refreshed all our intances. All in all,  there are some tables and SQLs that affect the performance of the Database. We guess, that these queries are loaded into the RAM of the Database which in the end affects other queries and our performance dramatically.

(After restarting an refreshing our instances, we selected most tables with "good" results. In the next step, we startet some queries on our issued tables: These queries took about 500 seconds. After that, we faced nearly the same issues as already known when selecting other tables and views.)

We are now checking all of the SQLs of the table and try to fix them as best as possible. (Nevertheless, I can´t explain how these problems might occure suddenly, the Business Rules and DDLs have not been changed.)

 

Again, thank you so much for your tips, the amount of joins and selects should be the main reason for our issues.

We´ll definitely keep you updated! 🙂

 

Simon

Xpert

Basically what I checked for was the recommended memory size and the auxiliary size (index size) but the database is tiny.

Further no heavy HDD reads/writes can be seen.

From the low workload from the entries you have posted I would guess the database is idle and the database size is really small so it would fit completely into memory.

I have to  say that I read over the fact that only 16 GB of the memory are assigned to the database and the server has 32 GB of memory.

We once ran into an issue where bucketfs consuming all the available memory but that should be fixed in your version (EXASOL-2523) 

Can you execute a slow query by using the explain feature of DBeaver and post the result?

 

 

 

 

SQL-Fighter

This query took approximately 6 seconds (way too long).

Moritz_0-1598605999328.png

Do you need further information?
Best

Moritz

Xpert

In Exasol you want to use all the memory your infrastructure can provide.

That's mainly the use case for an in memory database.

Can you post output of 

select * from EXA_STATISTICS.EXA_DB_SIZE_LAST_DAY   order by MEASURE_TIME  desc;
select * from EXA_STATISTICS.EXA_MONITOR_LAST_DAY   order by  MEASURE_TIME desc;

Exasol will not perform good if you don't have enough memory to keep at least the indexes in memory.

So when you see volatile response times it may be due to the fact that objects and indexes have to be cast out of memory and re-read often.

 

SQL-Fighter

Hi Charlie,

thank you very much for your quick reply. This is much appreciated!
Please find the output of the queries below.

select * from EXA_STATISTICS.EXA_DB_SIZE_LAST_DAY   order by MEASURE_TIME  desc;

Moritz_13-1598599953197.png

select * from EXA_STATISTICS.EXA_MONITOR_LAST_DAY   order by  MEASURE_TIME desc;

Moritz_14-1598599953208.png

Does this information help you?

Best

Moritz

 

Team Exasol
Team Exasol

Hi Moritz,

I assume, you had no other query running between 9:24 and 9:29. Memory is definitely not an issue. 

As you are using DBeaver, you could take one of your problematic queries and execute it with Ctrl+Shift+E. This will return the execution plan of the query and might give you some more hints on what's going on. Please also see https://docs.exasol.com/database_concepts/profiling.htm for some further explanations on this profiling. Feel free to post the result here.

SQL-Fighter

Hi Carsten,

Thank you very much for your reply.

Interestingly, the most problematic query does not even execute the Crtl+Shift+E command properly since it is running for 10 minutes now.

Moritz_1-1598606859081.png

 

Please find a result of another query below. It takes approximately 6 seconds which is way too long given the fact that it is a small table.

Do you need further information?

Best

Moritz

 

 

SQL-Fighter

*

Moritz_2-1598606945161.png

 

Xpert

I can't see the duration of the parts in the screenshot and I'm unable to see all parts.

 

You could export the info by doing 

 

export (select * from EXA_STATISTICS.EXA_DBA_PROFILE_LAST_DAY where session_id = current_session and stmt_id = 28) into local csv file '<path>\<file>.csv.gz';

and attach the file.

 

Otherwise you would have to attach quite a lot of screenshots to see all information 

SQL-Fighter

Please find the file attached.

Xpert

Hi,

I think you did not execute the export in the same session as the sql so you would need to change the sql in the export to

select * from EXA_STATISTICS.EXA_DBA_PROFILE_LAST_DAY where stmt_id = 28

 or you follow the manual posted by @exa-Carsten  to generate the acccess plan csv.

SQL-Fighter

Hi Charlie,

sorry for the mistake, I hope the attached file works!

SQL-Fighter

There's a cross join happening (line 43 of your profile) which eats up a lot of time.

A cross join can be intended, but usually it's not. To get to the bottom of this I think we need the definition of  ACCESSLAYER.FACT_PROJECTS_BUDGET_MONEY_MONTH.

Or maybe some profile-magicians here can extract more out of this 🙂

Btw: if your DBeaver throws a timeout error after 10 minutes you can change this setting in your connection properties: 

kristof_0-1598628442521.png

So maybe you can also provide the profile of your > 10 minutes query.

Btw2: you can see the current state/profile of still running queries in EXA_DBA_PROFILE_RUNNING

Have a nice weekend!