UDF - Async call with PyExasol

rodrigogargiulo
Contributor

Hello

I am trying to load files from S3 bucket. The script works, but to trigger the script I want to use an AWS Lambda Function. But if the load time is more than 15mins lambda will terminate. 

So is it possible to trigger the udf script async?

Thank you. 

3 ACCEPTED SOLUTIONS

Accepted Solutions

littlekoi
Xpert

The right solution for this task is to run a script on a small EC2 instance with no time limits. It will be able to run long queries and process errors if something goes wrong.

 

As far as I know, "async" calls are not supported by all Exasol drivers, including ODBC and JDBC. And it's generally a good thing. Otherwise, if something goes wrong during the "async" execution, you may get an SQL error with no client to listen to it.

View solution in original post

exa-Chris
Community Manager
Community Manager

Thanks @littlekoi for the answer. @rodrigogargiulo would this solve the question? If so, please "Accept as Solution" Thanks Christian

Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...

View solution in original post

ADoerr
SQL-Fighter

Hi all 🙂

 

Indeed Lambda functions are not really the correct tool for such a use case because of the runtime-limitations.

 

As an alternativ for a EC2 task, I am using AWS Glue Jobs + Workflows.

This could be an option, if you are not interested in setting up & maintain an separate EC2 instance.

At my GitHub you can find some example Python script, which I use to load my little Exasol DWH:

https://github.com/BeatTheBookie/BeatTheBookie/tree/master/007_aws/001_glue_jobs/100_dwh

This is pretty simple as the single Jobs just execute a bunch of SQL scripts in specific buckets.

 

But it really depends on your requirements. Glue Jobs also caused me some trouble, when it's about loading additional Python packages combined with VPC configuration. This could be easier, using a EC2 instance. In comparison an EC2 instance does not provide you the orchestration functionality. That's why I am using Glue workflows.

 

If you need to use a combination of Lambda Function, EC2 tasks and Glue Jobs, it could also be a possibility to take a look at AWS Step Functions. There you are able to orchestrate these different task types. 

 

 

 

View solution in original post

6 REPLIES 6

exa-Matze
Team Exasol
Team Exasol

Paging Dr. @littlekoi 😀
According to the reference there is no asynchronous execution option (https://github.com/exasol/pyexasol/blob/master/docs/REFERENCE.md)

@exa-Manuel , maybe you can share some thoughts on a workaround for lambda function timeouts

littlekoi
Xpert

The right solution for this task is to run a script on a small EC2 instance with no time limits. It will be able to run long queries and process errors if something goes wrong.

 

As far as I know, "async" calls are not supported by all Exasol drivers, including ODBC and JDBC. And it's generally a good thing. Otherwise, if something goes wrong during the "async" execution, you may get an SQL error with no client to listen to it.

View solution in original post

exa-Chris
Community Manager
Community Manager

Thanks @littlekoi for the answer. @rodrigogargiulo would this solve the question? If so, please "Accept as Solution" Thanks Christian

Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...

View solution in original post

ADoerr
SQL-Fighter

Hi all 🙂

 

Indeed Lambda functions are not really the correct tool for such a use case because of the runtime-limitations.

 

As an alternativ for a EC2 task, I am using AWS Glue Jobs + Workflows.

This could be an option, if you are not interested in setting up & maintain an separate EC2 instance.

At my GitHub you can find some example Python script, which I use to load my little Exasol DWH:

https://github.com/BeatTheBookie/BeatTheBookie/tree/master/007_aws/001_glue_jobs/100_dwh

This is pretty simple as the single Jobs just execute a bunch of SQL scripts in specific buckets.

 

But it really depends on your requirements. Glue Jobs also caused me some trouble, when it's about loading additional Python packages combined with VPC configuration. This could be easier, using a EC2 instance. In comparison an EC2 instance does not provide you the orchestration functionality. That's why I am using Glue workflows.

 

If you need to use a combination of Lambda Function, EC2 tasks and Glue Jobs, it could also be a possibility to take a look at AWS Step Functions. There you are able to orchestrate these different task types. 

 

 

 

View solution in original post

exa-Chris
Community Manager
Community Manager
Hi Andre, thanks for sharing. Would it be possible to write a small Blog/TKB about your solution? Thanks Christian
Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...

ADoerr
SQL-Fighter

 

There's already a blog, how I managed the AWS Glue automation:

https://beatthebookie.blog/2021/02/05/automate-your-betting-models-with-aws/

 

Currently I am working on a more betting oriented blog.

But the next one is again a technical one, where I planned to describe my integration of AWS and Exasol.