Issue with execution speed of Python UDF script

Contributor

Hello guys,

I have strange situation with Python UDF script and it's execution time. So the job of the script is to parse JSON string with weather forecast.
JSON string for every record is not so big, it has about 250 items in one array.

So for every item the script parses forecast keys and emits them in one row. It is SET EMIT script. Forecast is first prefetched into stage table.

In stage table there is 653 rows and when I start the script with INSERT INTO... or CREATE TABLE AS... execution can't stop. I checked the script on 1 row,

5 rows, 10 rows, 50 rows, 60 rows, 70 rows and 75 rows. It works and it ends with input of 70 rows, but with 75 it can't stop in two hours.

I checked also JSON strings for their validity, they are all valid, so the problem is in the amount of data? It is so small amount, only 75 rows. 

 

I can't figure it out, there is no recursion or something like that in the script.

The script is attached in this post.

Please give me any help or advice, I appreciate it.

BR,

Tom

1 ACCEPTED SOLUTION

SQL-Fighter

Hi 🙂 

Could you try running it as a SCALAR script. Not a SET script.

 

You defined it as a SET script. So all rows a used as a input for a single UDF execution, when you don't call it

with a GROUP BY clause in the SQL statement.

Maybe this causes the performance issue.

When changing it to a SCALAR script, the UDF would be executed for every single row.

This would be similiar to executing it, with a limit of one row.

 

Hope that helps.

View solution in original post

4 REPLIES 4

SQL-Fighter

Hi 🙂 

Could you try running it as a SCALAR script. Not a SET script.

 

You defined it as a SET script. So all rows a used as a input for a single UDF execution, when you don't call it

with a GROUP BY clause in the SQL statement.

Maybe this causes the performance issue.

When changing it to a SCALAR script, the UDF would be executed for every single row.

This would be similiar to executing it, with a limit of one row.

 

Hope that helps.

View solution in original post

Contributor

Hi ADoerr,

yes, I changed it in SCALAR script and now it runs smoothly. Performance is much better now.

Thank you all for suggestions.

 

BR,

Tom

Team Exasol
Team Exasol

Hi Tom,

this might be because of the SET EMIT function you are using. Could you please share the UDF code you are using? This will make it way easier to find the issue 🙂

Best
Lennart

Contributor

Hi Lennart,

I attached the script in previous post.

Thanks.

BR,

Tom