architecture / querying across virtual schemas

drumcircle
Rising Star

I would like to present thousands of virtualized tables as a collection of multiple schemas.

The underlying data exists in one schema.  The Virtual Schema adapter can be bound to multiple virtual schemas all pointing to the same LOCAL exasol.

Questions:

1. In this scenario, how does a query executed across multiple Virtual Schemas actually perform?  Is SQL pushed down?  

SELECT VS1.TABLE1.FOO, VS2.TABLE1.BAR

FROM VS1.TABLE1, VS1.TABLE2

WHERE VS1.TABLE1.KEY=VS.TABLE2.KEY

2.  Can a single Virtual Schema present multiple schemas or databases or catalogs to the client driver?  I'm assuming that aliasing the adapter multiple times with a schema parameter is preferred...

 

Thanks!

4 REPLIES 4

drumcircle
Rising Star

Peter,

Many thanks for your engagement, and experience.

Sub-selects (on the same table) are quite common, I believe Tableau generates them automatically for LOD calculations.  Can you tell me more about what kinds of queries that cannot be pushed down?  How much of this is configurable or avoidable?

> if the user runs a complex query that the virtual schema framework doesn't support (e.g. sub-selects) it can easily end up running full table scans of the underlying tables which has terrible performance.

 

Anthony

PeterK
Xpert

Hi Anthony,

The best way to find out how the virtual schema handles different queries is to run 

EXPLAIN VIRTUAL SELECT...query-referencing-virtual-schema.

The output will show what will get pushed down. It's a bit hard to generalize what can be pushed down or not so I encourage you to play around with EXPLAIN VIRTUAL for the particular queries you need to run.

The most common issue we ran into was the fact that if you join a virtual schema with a non-virtual-schema with the criteria specified against the non-virtual-schema it will often attempt to materialize the entire virtual schema portion and then create an index for it on the fly before joining against the non-virtual-schema portion.

There's not much you can do to work around this unless you can somehow have the criteria be applied to the virtual schema portion. Alternatively if you can "expand" your virtual schema to encompass both schemas it will resolve this particular issue however this is requires extending the default virtual schema adapter (or writing your own.)

Hope that helps,

Peter

PeterK
Xpert

Hi @drumcircle ,

1) We use virtual schemas a little bit and found that joining a virtual schema table to any other schema (virtual or otherwise) can very often lead to poor performance. This is because, depending on the query criteria, it will often result in full table scans of many or all of the virtual tables.  If the optimizer plan decides to apply the criteria against the other schema first then when it comes to the point of joining against the virtual schema table it will try to materialize the entire table so that it can create a join index.

2) Yes a single virtual schema can present multiple schemas but you'll have to write your own adapter (or extend the default one from Exasol's github site). The adapter will also have to handle table-name collisions in case the schemas have tables with the same name.

This is probably not your use-case but, as an example, we have found it quite useful to have a virtual schema that presents a UNIONed schema of multiple multi-tenant schemas. That way privileged users can see all tenant's data in a single schema (with tenant name column automatically added by the adapter). This works fairly well but even in this case we have to be careful because if the user runs a complex query that the virtual schema framework doesn't support (e.g. sub-selects) it can easily end up running full table scans of the underlying tables which has terrible performance.

Best,

Peter

exa-Nico
Community Manager
Community Manager

@drumcircle what happens when you run an EXPLAIN VIRTUAL command? This command will show you exactly what is pushed down and which queries are generated. Just try a 

EXPLAIN VIRTUAL SELECT VS1.TABLE1.FOO, VS2.TABLE1.BAR
FROM VS1.TABLE1, VS1.TABLE2
WHERE VS1.TABLE1.KEY=VS.TABLE2.KEY

 (using your dummy tables).

A virtual schema can only be mapped to one schema/catalog/database. If you want to reference many schemas, you would need multiple virtual schemas. Or you could try to create a virtual schema adapter on your own which handles that (but I'm not sure how easy or difficult that would be tbh)

Sports Enthusiast. Database Guy. Member of Team Exasol.
Having trouble? Just let me know!

Please Give Kudos if you find this post useful and also mark this as Accepted Solution if this is the required solution