scd type 2 in exasol

dnienaber
Contributor

Hello,

we want to create a scd type 2 history in exasol.

We used in the past MS db. We were able to use also psql to solve this. You can find an example below:

Type 2 SCD with SQL Merge - WaterOx Consulting, Inc

This works pretty fine. Now we want to use the same procedure on Exasol database. Unfortunately we are not able to use psql anymore. Therefore it dosn't work.

Did somebody managed this task already?

Thanks a lot, 

Daniel

16 REPLIES 16

mwellbro
Xpert

Hi Gabi,
to keep this post focused on the OP ( "scd type 2 in exasol" ) I´ve "forked" a different thread for the discussion around the question: https://community.exasol.com/t5/discussion-forum/node-sync-on-a-single-node-instance/td-p/7014 .

Cheers,
Malte

mwellbro
Xpert

Good point from Peggy, and also perhaps something to check regarding performance: should you change columns with the scd that are part of a distribution key then this could potentially trigger a lot of IndexRebuild and RowMovement - do you already have distribution keys set on your table(s) and if so, have you checked for this particular effect ?

Cheers,
Malte

mwellbro
Xpert

To additionally check my assumption regarding the "CPU-boundness" of this approach I ran the example while concurrently creating a bit of CPU pressure to check how big of an impact that would create:

mwellbro_0-1632515405792.png

 

The three highlighted MERGE statements represent ( from bottom to top) :
3) The execution running "stand alone" ( as can be seen from the 100% resource column value )
2) The execution running parallel to another session which executed a sleep ( more on that shortly )
   - notice that here we do see 50% resources but nearly identical runtime - is our MERGE really not
     hindered by other sessions ? Well, not so fast: check out the CPU column: The "sleep-query" ( line 8 )
     only takes 0.8 % of CPU, so while I wanted exasol to reduce resources for the MERGE, I wasn´t successful
     here
1) The execution running parallel to another session which executed a "busy_sleep"

   - with a different implementation for the parallel query that eats up more CPU ( line 4, 49.5% CPU ) we can
     now observe how much "damage" we can do to the performance of our MERGE when CPU get´s scarce.

So this might be another "venue of research" you can dig into when checking why performance doesn´t turn out to be what you´d expect, @dnienaber .

And for those of you who´d like to know what the difference between "sleep" and "busy_sleep" is, here´s the code:

 

-- while elegant and resource efficient, for our current use case this didn´t do the
-- trick since we wanted to "burn some CPU cycles"
CREATE OR REPLACE LUA SCALAR SCRIPT UTIL.SLEEP(seconds_to_sleep number) RETURNS integer
AS 

socket = require("socket")

function run(ctx)
  return socket.sleep(ctx.seconds_to_sleep)
end

/

-- a more wasteful implementation than before, but
-- since we wanted the CPU to "get busy" this was spot on
CREATE OR REPLACE LUA SCALAR SCRIPT UTIL.BUSY_SLEEP(seconds_to_sleep DECIMAL(10,0)) RETURNS DECIMAL
AS 
local clock = os.clock

function run(ctx)
  local t0 = clock()
   while clock() - t0 <= ctx.seconds_to_sleep+0.0 do end
  return
end

/

 

Shoutout to @exa-StefanR who "way back when" pointed me to the first implementation - this was my first time
using it - I just picked the wrong time 😉

Cheers,
Malte

mwellbro
Xpert

Hi @dnienaber , not to be pesky for those details but if you can, could you share what weak performance did mean ( i.e. rows/data volume GiB to time ratio or something along those lines ) ? And maybe in the meantime you also
got a solution/workaround that would be useful to know ?

Or is this more along the lines of "abandon all hope(for details) ye who enter(this thread)" - I think that´s Dante, or at least close enough 😉 .

Cheers,
Malte

dnienaber
Contributor

Hi @mwellbro ,

we got an other propsal from @exa-Peggy (see attached) which works pretty fine and also very fast. 
I guess we just turned the wrong direction at the first time. 
Currently we are struggeling with some data sets which will not be covered and doing soft deletes as well. We are going to analyze the issue right now.

Thank you

Daniel

exa-MathiasHo
Community Manager
Community Manager

Hi @dnienaber,

I think there is a misunderstanding here. In the ticket, Peggy merely asked if the solution provided by Malte was successful because you have had not responded. She only wanted the profiling information.

mwellbro
Xpert

Hi Daniel,

happy to hear it, even though I share in the confusion that´s mentioned by exa-MathiasHo - I looked into the code attached and it seemed like the original example with additions for creating profile information.

If this is the code that´s running fine now then I´d assume that the weak performance observed previously was
indeed a transient phenomenon  ( concurrency- , resources-,  index- or distribution related , etc. ).

That being said, I hope your system keeps running fine and the topics you are currently investigating will
also be resolved - keeping my eyes peeled for any updates or future threads from your end 🙂 .

Cheers,
Malte