Share "UDF-work" across nodes

mwellbro
Xpert

Hi all,

just a little something I came across that might be relevant for some of you:

Suppose you are using an UDF to do some work in your query, let´s say check data on different servers or import data from them.

If you run a cluster with multiple nodes you´d be expecting that your calls are distributed across the nodes and that if
all UDF executions take the same time your runtime should scale accordingly.

Let´s illustrate this with a little example:

 

CREATE OR REPLACE LUA SCALAR SCRIPT "SLEEP" ("num_sec" DECIMAL(18,0)) RETURNS DOUBLE AS
function run(ctx)
local clock = os.clock
local t0 = clock()
   while (clock() - t0) <= (ctx.num_sec+0) do
   end
 return 1.00
end
/

Here is a little brutish implementation of a SLEEP function in order to keep our nodes busy ( in a real world scenario this is
were you´d call upon remote servers or due anything else you need ).

Now suppose you have a cluster with 8 nodes and call the following query:

with base(idx,url) as (
VALUES
  (1, 'http://www.FooAndBar.de/something')
, (2, 'http://www.FooAndBar.de/something')
, (3, 'http://www.FooAndBar.de/something')
, (4, 'http://www.FooAndBar.de/something')
, (5, 'http://www.FooAndBar.de/something')
, (6, 'http://www.FooAndBar.de/something')
, (7, 'http://www.FooAndBar.de/something')
, (8, 'http://www.FooAndBar.de/something')
)    
select sleep(30) from base
group by iproc()
;

The expected runtime for this query would be 30 seconds since we have 8 nodes executing our UDF in parallel - let´s see how that
turned out:

mwellbro_0-1620896293150.png

It would seem that our work was not evenly distributed - the explanation here lies with how the VALUES clause generated it´s rows, which can end up a bit skewed ( thanks to @exa-StefanR ).

To solve this we can enhance our CTE with a little "select...order by 1" , this forces a materialization ( I think we´ll survive the impact of 4 rows here ) and produces a more "round robin style" distribution. And the result ?

mwellbro_2-1620896804868.png

If you look closely you´ll also spot the difference in output rows, since I did a group by on iproc() this tells us that only 
6 of the 8 nodes worked on input rows last time ( and 2 of them had to do two calls, which explains the previous ~60 sec. runtime ).

Hope this little bit of info will help someone around here, and of course: "Audit, or it didn´t happen":

mwellbro_3-1620896971707.png

Cheers,
Malte

1 REPLY 1

exa-MathiasHo
Community Manager
Community Manager

This knowledge is super useful. Thank you for sharing this!