Team Exasol
Team Exasol

Background

Exasol R SDK offers functionality to interact with the Exasol database out of R programs. It extends the ORDBC and offers fast data transfer between Exasol and R (exa.readData() and exa.writeData()) and it makes it convenient to run parts of your R code in parallel in Exasol using R UDF functions behind the scenes (exa.createScript())

Example (Digit recognition)

We will use a well-known machine learning algorithm for classifying handwritten digits. It is using a randomforest library, which is used to create decision trees for any kind of image. For a given training set and the actual result for each element, the library returns a decision tree that can be applied to other similar images, the test data. The data is stored pixel-wise (one value for each pixel).
This How-To will focus on how to use the EXASolution R SDK and not focus on details of the algorithm or R. Further information on the algorithm and the data can be found here.
The example was chosen to show how easy it is to read or write data from or to EXASolution and how easy it is to deploy a given algorithm to an Exasol UDF.

Prerequisites

Exasol R SDK and randomForest needs to be installed on the used system.
The attached files can be used to setup your Exasol database for the example:

  1. Run 01_DDL.sql
     
      (Creates a schema, a table for train data set and a table for test data set)
  2. Place csv files (train.csv
     
     and test.csv
     
    ) at an appropriate place and change the path to those files in 02_IMPORT.sql
     
  3. Run 02_IMPORT.sql
     
     (Imports the train and test data set)

How to use ExaSolution R SDK

R implementation (reading the data sets from an EXASolution database)

The attached file 03_RandomForest.r shows how to use exa.readData() and exa.writeData() to interact with the database in R. As of Exasol 6.2.x, we needed to modify 1 line in 03_RandomForest.r in order to account for R version differences. Specifically, we are adding a parameter to the save command - version = 2. You are welcome to skip this modification and come back to revisit if you run into issues.

Before:

 save(rf, file = "< path to rf.dat>")

After:

save(rf, version = 2, file = "< path to rf.dat>")

 Additional installation notes for 03_RandomForest.r - installing needed packages. Continuing on, the general procedure is:

Step 1. Connect to EXASolution and read the train and test data

This requires an ODBC DSN (named "solo" in this example)

Step 2. Read train and test data from the database and prepare them for further processing

  1. Test data was enriched by an image id
  2. Train data was enriched by a label showing the actual handwritten digit

Step 3. Use randomForest() to create a decision tree and print prediction

  1. In this example, the decision forest is kept and tested on the test data
  2. Extract the predictions for test data and print them

Step 4. Write decision tree to Exasol

  1. We will use a bucket called binary to store the decision forest
  2. Detailed information on EXABucket FS and buckets can be found in https://community.exasol.com/t5/environment-management/how-to-create-an-exabucketfs-service-and-buck...**
  3. This item does not apply to Exasol 6+ and is only left in for archival purposes. Important Deprecation Notice:  In Version 5 it is not possible to store binary data in EXASOL database. One possibility is to serialize the decision forest and store it in a table. SInce VARCHAR type in EXASOL is limited to 2 million characters, the resulting string should not exceed 2 million characters. 
    curl -v -X PUT -T rf.dat http://w:<write_pw>@<ip of one cluster node>:<port>/binary/rf.dat

Step 5. Deploying a UDF

The other file (05_PREDICTINDB.sql) deploys the functionality to a UDF in EXASOL. We create a variadic UDF receiving an arbitrary amount of input parameters (all pixels)
The output is the predicted digit.

  1. Load forest from the bucket
  2. Create a dataframe for all input columns (pixels)
  3. Use predict() and emit the predictions

Step 6. Using UDF within EXASolution

Now it is possible to use the deployed function within EXASolution like any other built-in function:

select rf.predict(Model_str, PIXEL0,PIXEL1,PIXEL2,PIXEL3,PIXEL4,PIXEL5,PIXEL6,PIXEL7,PIXEL8,PIXEL9,PIXEL10,PIXEL11,PIXEL12,PIXEL13,PIXEL14,PIXEL15,PIXEL16,PIXEL17,PIXEL18,PIXEL19,PIXEL20,PIXEL21,PIXEL22,PIXEL23,PIXEL24,PIXEL25,PIXEL26,PIXEL27,PIXEL28,PIXEL29,PIXEL30,PIXEL31,PIXEL32,PIXEL33,PIXEL34,PIXEL35,PIXEL36,PIXEL37,PIXEL38,PIXEL39,PIXEL40,PIXEL41,PIXEL42,PIXEL43,PIXEL44,PIXEL45,PIXEL46,PIXEL47,PIXEL48,PIXEL49,PIXEL50,PIXEL51,PIXEL52,PIXEL53,PIXEL54,PIXEL55,PIXEL56,PIXEL57,PIXEL58,PIXEL59,PIXEL60,PIXEL61,PIXEL62,PIXEL63,PIXEL64,PIXEL65,PIXEL66,PIXEL67,PIXEL68,PIXEL69,PIXEL70,PIXEL71,PIXEL72,PIXEL73,PIXEL74,PIXEL75,PIXEL76,PIXEL77,PIXEL78,PIXEL79,PIXEL80,PIXEL81,PIXEL82,PIXEL83,PIXEL84,PIXEL85,PIXEL86,PIXEL87,PIXEL88,PIXEL89,PIXEL90,PIXEL91,PIXEL92,PIXEL93,PIXEL94,PIXEL95,PIXEL96,PIXEL97,PIXEL98,PIXEL99,PIXEL100,PIXEL101,PIXEL102,PIXEL103,PIXEL104,PIXEL105,PIXEL106,PIXEL107,PIXEL108,PIXEL109,PIXEL110,PIXEL111,PIXEL112,PIXEL113,PIXEL114,PIXEL115,PIXEL116,PIXEL117,PIXEL118,PIXEL119,PIXEL120,PIXEL121,PIXEL122,PIXEL123,PIXEL124,PIXEL125,PIXEL126,PIXEL127,PIXEL128,PIXEL129,PIXEL130,PIXEL131,PIXEL132,PIXEL133,PIXEL134,PIXEL135,PIXEL136,PIXEL137,PIXEL138,PIXEL139,PIXEL140,PIXEL141,PIXEL142,PIXEL143,PIXEL144,PIXEL145,PIXEL146,PIXEL147,PIXEL148,PIXEL149,PIXEL150,PIXEL151,PIXEL152,PIXEL153,PIXEL154,PIXEL155,PIXEL156,PIXEL157,PIXEL158,PIXEL159,PIXEL160,PIXEL161,PIXEL162,PIXEL163,PIXEL164,PIXEL165,PIXEL166,PIXEL167,PIXEL168,PIXEL169,PIXEL170,PIXEL171,PIXEL172,PIXEL173,PIXEL174,PIXEL175,PIXEL176,PIXEL177,PIXEL178,PIXEL179,PIXEL180,PIXEL181,PIXEL182,PIXEL183,PIXEL184,PIXEL185,PIXEL186,PIXEL187,PIXEL188,PIXEL189,PIXEL190,PIXEL191,PIXEL192,PIXEL193,PIXEL194,PIXEL195,PIXEL196,PIXEL197,PIXEL198,PIXEL199,PIXEL200,PIXEL201,PIXEL202,PIXEL203,PIXEL204,PIXEL205,PIXEL206,PIXEL207,PIXEL208,PIXEL209,PIXEL210,PIXEL211,PIXEL212,PIXEL213,PIXEL214,PIXEL215,PIXEL216,PIXEL217,PIXEL218,PIXEL219,PIXEL220,PIXEL221,PIXEL222,PIXEL223,PIXEL224,PIXEL225,PIXEL226,PIXEL227,PIXEL228,PIXEL229,PIXEL230,PIXEL231,PIXEL232,PIXEL233,PIXEL234,PIXEL235,PIXEL236,PIXEL237,PIXEL238,PIXEL239,PIXEL240,PIXEL241,PIXEL242,PIXEL243,PIXEL244,PIXEL245,PIXEL246,PIXEL247,PIXEL248,PIXEL249,PIXEL250,PIXEL251,PIXEL252,PIXEL253,PIXEL254,PIXEL255,PIXEL256,PIXEL257,PIXEL258,PIXEL259,PIXEL260,PIXEL261,PIXEL262,PIXEL263,PIXEL264,PIXEL265,PIXEL266,PIXEL267,PIXEL268,PIXEL269,PIXEL270,PIXEL271,PIXEL272,PIXEL273,PIXEL274,PIXEL275,PIXEL276,PIXEL277,PIXEL278,PIXEL279,PIXEL280,PIXEL281,PIXEL282,PIXEL283,PIXEL284,PIXEL285,PIXEL286,PIXEL287,PIXEL288,PIXEL289,PIXEL290,PIXEL291,PIXEL292,PIXEL293,PIXEL294,PIXEL295,PIXEL296,PIXEL297,PIXEL298,PIXEL299,PIXEL300,PIXEL301,PIXEL302,PIXEL303,PIXEL304,PIXEL305,PIXEL306,PIXEL307,PIXEL308,PIXEL309,PIXEL310,PIXEL311,PIXEL312,PIXEL313,PIXEL314,PIXEL315,PIXEL316,PIXEL317,PIXEL318,PIXEL319,PIXEL320,PIXEL321,PIXEL322,PIXEL323,PIXEL324,PIXEL325,PIXEL326,PIXEL327,PIXEL328,PIXEL329,PIXEL330,PIXEL331,PIXEL332,PIXEL333,PIXEL334,PIXEL335,PIXEL336,PIXEL337,PIXEL338,PIXEL339,PIXEL340,PIXEL341,PIXEL342,PIXEL343,PIXEL344,PIXEL345,PIXEL346,PIXEL347,PIXEL348,PIXEL349,PIXEL350,PIXEL351,PIXEL352,PIXEL353,PIXEL354,PIXEL355,PIXEL356,PIXEL357,PIXEL358,PIXEL359,PIXEL360,PIXEL361,PIXEL362,PIXEL363,PIXEL364,PIXEL365,PIXEL366,PIXEL367,PIXEL368,PIXEL369,PIXEL370,PIXEL371,PIXEL372,PIXEL373,PIXEL374,PIXEL375,PIXEL376,PIXEL377,PIXEL378,PIXEL379,PIXEL380,PIXEL381,PIXEL382,PIXEL383,PIXEL384,PIXEL385,PIXEL386,PIXEL387,PIXEL388,PIXEL389,PIXEL390,PIXEL391,PIXEL392,PIXEL393,PIXEL394,PIXEL395,PIXEL396,PIXEL397,PIXEL398,PIXEL399,PIXEL400,PIXEL401,PIXEL402,PIXEL403,PIXEL404,PIXEL405,PIXEL406,PIXEL407,PIXEL408,PIXEL409,PIXEL410,PIXEL411,PIXEL412,PIXEL413,PIXEL414,PIXEL415,PIXEL416,PIXEL417,PIXEL418,PIXEL419,PIXEL420,PIXEL421,PIXEL422,PIXEL423,PIXEL424,PIXEL425,PIXEL426,PIXEL427,PIXEL428,PIXEL429,PIXEL430,PIXEL431,PIXEL432,PIXEL433,PIXEL434,PIXEL435,PIXEL436,PIXEL437,PIXEL438,PIXEL439,PIXEL440,PIXEL441,PIXEL442,PIXEL443,PIXEL444,PIXEL445,PIXEL446,PIXEL447,PIXEL448,PIXEL449,PIXEL450,PIXEL451,PIXEL452,PIXEL453,PIXEL454,PIXEL455,PIXEL456,PIXEL457,PIXEL458,PIXEL459,PIXEL460,PIXEL461,PIXEL462,PIXEL463,PIXEL464,PIXEL465,PIXEL466,PIXEL467,PIXEL468,PIXEL469,PIXEL470,PIXEL471,PIXEL472,PIXEL473,PIXEL474,PIXEL475,PIXEL476,PIXEL477,PIXEL478,PIXEL479,PIXEL480,PIXEL481,PIXEL482,PIXEL483,PIXEL484,PIXEL485,PIXEL486,PIXEL487,PIXEL488,PIXEL489,PIXEL490,PIXEL491,PIXEL492,PIXEL493,PIXEL494,PIXEL495,PIXEL496,PIXEL497,PIXEL498,PIXEL499,PIXEL500,PIXEL501,PIXEL502,PIXEL503,PIXEL504,PIXEL505,PIXEL506,PIXEL507,PIXEL508,PIXEL509,PIXEL510,PIXEL511,PIXEL512,PIXEL513,PIXEL514,PIXEL515,PIXEL516,PIXEL517,PIXEL518,PIXEL519,PIXEL520,PIXEL521,PIXEL522,PIXEL523,PIXEL524,PIXEL525,PIXEL526,PIXEL527,PIXEL528,PIXEL529,PIXEL530,PIXEL531,PIXEL532,PIXEL533,PIXEL534,PIXEL535,PIXEL536,PIXEL537,PIXEL538,PIXEL539,PIXEL540,PIXEL541,PIXEL542,PIXEL543,PIXEL544,PIXEL545,PIXEL546,PIXEL547,PIXEL548,PIXEL549,PIXEL550,PIXEL551,PIXEL552,PIXEL553,PIXEL554,PIXEL555,PIXEL556,PIXEL557,PIXEL558,PIXEL559,PIXEL560,PIXEL561,PIXEL562,PIXEL563,PIXEL564,PIXEL565,PIXEL566,PIXEL567,PIXEL568,PIXEL569,PIXEL570,PIXEL571,PIXEL572,PIXEL573,PIXEL574,PIXEL575,PIXEL576,PIXEL577,PIXEL578,PIXEL579,PIXEL580,PIXEL581,PIXEL582,PIXEL583,PIXEL584,PIXEL585,PIXEL586,PIXEL587,PIXEL588,PIXEL589,PIXEL590,PIXEL591,PIXEL592,PIXEL593,PIXEL594,PIXEL595,PIXEL596,PIXEL597,PIXEL598,PIXEL599,PIXEL600,PIXEL601,PIXEL602,PIXEL603,PIXEL604,PIXEL605,PIXEL606,PIXEL607,PIXEL608,PIXEL609,PIXEL610,PIXEL611,PIXEL612,PIXEL613,PIXEL614,PIXEL615,PIXEL616,PIXEL617,PIXEL618,PIXEL619,PIXEL620,PIXEL621,PIXEL622,PIXEL623,PIXEL624,PIXEL625,PIXEL626,PIXEL627,PIXEL628,PIXEL629,PIXEL630,PIXEL631,PIXEL632,PIXEL633,PIXEL634,PIXEL635,PIXEL636,PIXEL637,PIXEL638,PIXEL639,PIXEL640,PIXEL641,PIXEL642,PIXEL643,PIXEL644,PIXEL645,PIXEL646,PIXEL647,PIXEL648,PIXEL649,PIXEL650,PIXEL651,PIXEL652,PIXEL653,PIXEL654,PIXEL655,PIXEL656,PIXEL657,PIXEL658,PIXEL659,PIXEL660,PIXEL661,PIXEL662,PIXEL663,PIXEL664,PIXEL665,PIXEL666,PIXEL667,PIXEL668,PIXEL669,PIXEL670,PIXEL671,PIXEL672,PIXEL673,PIXEL674,PIXEL675,PIXEL676,PIXEL677,PIXEL678,PIXEL679,PIXEL680,PIXEL681,PIXEL682,PIXEL683,PIXEL684,PIXEL685,PIXEL686,PIXEL687,PIXEL688,PIXEL689,PIXEL690,PIXEL691,PIXEL692,PIXEL693,PIXEL694,PIXEL695,PIXEL696,PIXEL697,PIXEL698,PIXEL699,PIXEL700,PIXEL701,PIXEL702,PIXEL703,PIXEL704,PIXEL705,PIXEL706,PIXEL707,PIXEL708,PIXEL709,PIXEL710,PIXEL711,PIXEL712,PIXEL713,PIXEL714,PIXEL715,PIXEL716,PIXEL717,PIXEL718,PIXEL719,PIXEL720,PIXEL721,PIXEL722,PIXEL723,PIXEL724,PIXEL725,PIXEL726,PIXEL727,PIXEL728,PIXEL729,PIXEL730,PIXEL731,PIXEL732,PIXEL733,PIXEL734,PIXEL735,PIXEL736,PIXEL737,PIXEL738,PIXEL739,PIXEL740,PIXEL741,PIXEL742,PIXEL743,PIXEL744,PIXEL745,PIXEL746,PIXEL747,PIXEL748,PIXEL749,PIXEL750,PIXEL751,PIXEL752,PIXEL753,PIXEL754,PIXEL755,PIXEL756,PIXEL757,PIXEL758,PIXEL759,PIXEL760,PIXEL761,PIXEL762,PIXEL763,PIXEL764,PIXEL765,PIXEL766,PIXEL767,PIXEL768,PIXEL769,PIXEL770,PIXEL771,PIXEL772,PIXEL773,PIXEL774,PIXEL775,PIXEL776,PIXEL777,PIXEL778,PIXEL779,PIXEL780,PIXEL781,PIXEL782,PIXEL783)
from rf.test cross join rfmodels.models;    
PRED
0
0
3
0
3
3
0
0
1
7
Comments
Team Exasol
Team Exasol

What a great article!