virtual schemas / jar size

drumcircle
SQL-Fighter

I noticed that the time for querying with a 200mb jar was around 7s, down to 2s with 20mb (using maven shade to reduce).

Is there a plan to keep containers alive between invocations?  In a parallel system it must be assumed that static variables can get out of synch or shared between sessions... but so what?  Every app server works that way.

1 ACCEPTED SOLUTION

PeterK
Xpert

Hi @drumcircle 

I don't know if this is a possibility for you but you may want to consider implementing the virtual schema using Python instead of Java. We benchmarked it a while back and found that a Python implementation had a much lower per-query overhead.

Our test results were roughly:

1) Simple query via default virtualschema-jdbc-adapter-dist-3.0.0.jar : 1.3s

2) Same query via complete rewrite of that adapter in Python: 0.2s

3) Same query run directly against underlying schema: 0.1s

This is all with IS_LOCAL set to TRUE.

Best,

Peter

View solution in original post

3 REPLIES 3

PeterK
Xpert

Hi @drumcircle 

I don't know if this is a possibility for you but you may want to consider implementing the virtual schema using Python instead of Java. We benchmarked it a while back and found that a Python implementation had a much lower per-query overhead.

Our test results were roughly:

1) Simple query via default virtualschema-jdbc-adapter-dist-3.0.0.jar : 1.3s

2) Same query via complete rewrite of that adapter in Python: 0.2s

3) Same query run directly against underlying schema: 0.1s

This is all with IS_LOCAL set to TRUE.

Best,

Peter

View solution in original post

drumcircle
SQL-Fighter

Peter, that's important (and unfortunate) information, thank you!

 

Were you able to include whatever Python libraries you needed to do exotic things (like external logging, web service access, S3 read/write, etc.)?

PeterK
Xpert

We only included python libraries required for accessing a remote Exasol instance (no external logging, S3 access etc).  However it is relatively straightforward to import arbitrary Python libraries into BucketFS for use in Virtual Schema adapters (and UDFs) if you need to add those capabilities.