Overriding existing Python modules

andreeroos
SQL-Fighter

I am trying to use another version of Pandas, due to limitations in the version that is installed with Exasol.
I am using this code snippet, but it seems that it is still the default version that is used.

sys.path.append(glob.glob('/buckets/bfsdefault/python/pandas-1.2.4/*'))

 

I've tried sys.start.insert(0,......) to make Python use this path first but no success with that. Is there a way to force my UDF to use my downloaded Pandas instead?

I would appreciate any help.

4 REPLIES 4

exa-Aleksandr
Team Exasol
Team Exasol

Just to add few cents to the brilliant comments above:

  • "python-3.6-data-science-EXASOL-6.2.0" language container flavor seems to have non-pinned (the latest) pandas: link
  • There is a nice tutorial releases soon: link

mwellbro
Xpert

not that helpful at the moment, just what I could test "in-between":

 

mwellbro_0-1620396289719.png

I noticed that the ".whl" - files for pandas-1.2.4 seem to require higher versions of python than is provided in the PYTHON or PYTHON3 scripts....not that it matters, my pandas-version still yields 0.22.0 , even after successful pip3-install ( albeit without dependencies )...

Will let you know if I find anything useful...but I kind of think that going with a language container might be your best bet...

Cheers,
Malte

exa-Nico
Community Manager
Community Manager

I think Jens is spot on. That solution with BucketFS might work, but after all the time needed to invest in the dependencies, uploading them, maintaining them, I think a script language container is probably safer for production purposes. And then it's available in all scripts as well without additional complication in that particular script. With a Script language container you could also define an entirely new language called PYTHON3_NEWPANDAS or something and can use the default and the new one interchangably. 

Is there a specific reason not to build a new container in your case?

Sports Enthusiast. Database Guy. Member of Team Exasol.
Having trouble? Just let me know!

Please Give Kudos if you find this post useful and also mark this as Accepted Solution if this is the required solution

jens_areto
SQL-Fighter

Hey Andre,

I have tried some things and I think I found a way how it could be done like this without creating a new language container but for me it not seems the way exasol wants user to do things like this.

 

CREATE OR REPLACE PYTHON3 SCALAR SCRIPT UDF_TEST.JENS_TEST () EMITS ("C" VARCHAR(2000000) UTF8) AS
import glob
import os
#sys.path.remove('/usr/lib/python3/dist-packages')
sys.path.extend(glob.glob('/buckets/bfsdefault/jsc/pandas-1.2.4/*'))
#sys.path.extend('/usr/lib/python3/dist-packages')
import pandas as pd
def run(c):
   fe = os.path.exists("/buckets/bfsdefault/jsc/test.xlsx")
   b = os.listdir('/usr/lib/python3/dist-packages')
   for d in b:
   	c.emit(d)
   c.emit( str( pd.__version__ ) )
/
;

SELECT UDF_TEST.JENS_TEST();

I tried this kind of things. When I just upload the pandas pyhton file to the bucket and add it to the system path it still takes the pandas 0.22.0. When I remove the path where all libraries are included by default and then add the pandas inisde the bucket to the syspath you will get following message:

jens_areto_0-1620366875864.png

No it takes the pandas from the bucket but all needed dependencies can not be found because they are all inside the folder I removed from syspath. You can do this and then you will have to add all dependencies manually inside your bucket. This is not really a way which it should be done because you will lose alls other libraries not in the bucket.

But if there is a solution for this problem where you not have to build a new language container I think its a little tricky. But the solution should be somewhere when you start playing with those paths.

If I find a way I will keep you updated.

Kindly regards

Jens