Fully managed cloud. 30-day FREE full featured trial. Start Now
cancel
Showing results for 
Search instead for 
Did you mean: 

PyExasol export_to_pandas results in data inconsistencies

kochjoe
SQL-Fighter

Hi there,

we are using the export_to_pandas from PyExasol package to analyse data in google colab.

sql_0 = "SELECT * FROM TABLE WHERE ..."

df_0 = C.export_to_pandas(sql_0)

df_0.head()

Involved text fields are up to 2'000'000 characters long. But we notice that in the resulting dataframe data inconsistencies: Data seem not to be consistent row wise. In a single row we see data from different columns. When we try to find this record within the database and filtering for the combination of data we do not find a single record.

Any help is appreciated.

2 REPLIES 2

exa-Aleksandr
Team Exasol
Team Exasol

Hi @kochjoe ,

A minimal (non-)working example will greatly help with investigation.

Say, steps to run a query with 2 columns that returns rows with data from different rows (did we understand correct that rows were meant and not different columns?). With all the following checking queries.

It will be also good to check the contents of EXA_DBA_AUDIT_SQL for that particular pyexasol session. You could get it right after establishing a connection by means of a query like 'select current_session'.

If you find it hard to prepare an anonymized enough example to post here you could create a ticket with Exasol Support.

littlekoi
Xpert

Internally "export_to_pandas()" produces the same data stream as normal EXPORT.

If you have a data set which causes problems, you may run "export_to_file()", get raw CSV file and look through it to find possible issues.

Also, CSV is parsed using standard C-parser from pandas. Pyexasol does not really introduce any kind of processing. Stream goes straight from Exasol server to C-parser. So if there is a problem, it is unlikely to be related specifically to pyexasol. But anything is possible, of course. So lets check this out.