UDF performance

drumcircle
Rising Star

My dev server is taking an extra 3 seconds to add a scripted aggregate to query results on very few rows.

The Python SET UDF is below, BITOR_AGG.  I'm wondering how to go about improving performance?  Different language, 7.1, refactor code?

Thanks so much...

 

CREATE OR REPLACE PYTHON3 SET SCRIPT "BITOR_AGG" ("i" DECIMAL(18,0)) RETURNS DECIMAL(18,0) AS
def run(ctx):
  if not ctx.size():
    return 0
    
  agg = 0
  
  while True:
    agg |= ctx.i

    if not ctx.next():
        break

  return

 

1 ACCEPTED SOLUTION

mwellbro
Xpert
Hi @drumcircle,
3 seconds sounds like the time it takes to fire up the vm for the python execution if I remember correctly.
If thats the reason then going for a lua implementation should fix your 3 second problem since lua is more directly integrated in exasol and hence does not need to create vm(s) before executing your code.

Cheers
Malte

View solution in original post

3 REPLIES 3

drumcircle
Rising Star

Appreciate your deep dive, Malte.  Thank you, sir.

mwellbro
Xpert

Hi again,

I just thought I´d run an example to at least validate my point regarding lua probably being faster:

mwellbro_0-1629062527890.png

 

Since with Exa-7.0.9. ( vis-a-vis Lua 5.1 ) I don´t have the luxuary of native bitwise operators I scooped up a code snipped from "the internet" ( warning: NOT a full fledged implementation but probably good enough for this simple test case -> https://stackoverflow.com/questions/32387117/bitwise-and-in-lua  ) and the screenshot shows the factor by which the two executions differ ( around a second for python3 compared to a fraction therof in case of lua ).

If you are going for Exa-7.1 you should have access to native lua bitwise operators ( courtesy of the lua upgrade to 5.4 as per https://www.exasol.com/portal/display/DOWNLOAD/7.1 ) - maybe that´s going to help in your case ?

I´ll drop the code used as a snipped here since copy&paste through screenshots is a pain 😉

CREATE TABLE DUMMY_VALUES AS SELECT LEVEL AS id FROM dual CONNECT BY LEVEL<=1000;

CREATE OR REPLACE PYTHON3 SET SCRIPT "BITOR_AGG" ("i" DECIMAL(18,0)) RETURNS DECIMAL(18,0) AS
def run(ctx):
  if not ctx.size():
    return 0
    
  agg = 0
  
  while True:
    agg |= ctx.i

    if not ctx.next():
        break

  return agg
/

CREATE OR REPLACE LUA SET SCRIPT "BITOR_AGG_LUA" ("i" DECIMAL(18,0)) RETURNS NUMBER AS

function bitoper(a, b, oper)
   local r, m, s = 0, 2^31
   repeat
      s,a,b = a+b+m, a%m, b%m
      r,m = r + m*oper%(s-a-b), m/2
   until m < 1
   return r
end

function run(ctx)
  if ctx.size() == 0
    then return 0
  else  
   local agg = 0
   repeat
    agg = bitoper(agg,ctx.i,1)
   until not ctx.next()
   return agg
  end
end
/

SELECT BITOR_AGG(id) FROM DUMMY_VALUES;
SELECT BITOR_AGG_LUA(id) FROM DUMMY_VALUES;


Cheers,
Malte

mwellbro
Xpert
Hi @drumcircle,
3 seconds sounds like the time it takes to fire up the vm for the python execution if I remember correctly.
If thats the reason then going for a lua implementation should fix your 3 second problem since lua is more directly integrated in exasol and hence does not need to create vm(s) before executing your code.

Cheers
Malte

View solution in original post