reported # of rows affected limited to 1,952,796,633

wunderdata
Contributor

I have a table with 6247763929 rows in it. 

select count(*) from ROUTE_DETAILS_RAW;

= 6247763929 

Whenever i run an update statement without any where-clause on that table, I get: 1,952,796,633 rows affected 

(for example after adding a new column and filling it.

alter table ROUTE_DETAILS_RAW add column charge_corridor_350 BOOLEAN;
update ROUTE_DETAILS_RAW set charge_corridor_350 = mod(route_distance_from_start,450) > 350;

  update ROUTE_DETAILS_RAW set charge_corridor_350 = mod(route_distance_from_start,450) > 350
[2021-11-22 15:25:09] 1,952,796,633 rows affected in 5 m 19 s 639 ms
 

Select count(*) from ROUTE_DETAILS_RAW where charge_corridor_350 is not null;

= 6247763929

Looks like the reported number for rows affected is wrong.

8 REPLIES 8

mwellbro
Xpert

Hi @wunderdata ,

just a hunch, but maybe this has something to do with different transactions, i.e. the transaction where you count is different from the one where you do the updates ?
I´ve seen such "aberrations" when someone works with different tabs in DBeaver, for instance, and each tab holds it´s own ( non-autocommit ) connection.

Which client are you using ?

update: most important question I forgot to ask: do you have Auditing enabled and if so, what´s exa_dba_audit_sql´s take on these statements ?

Cheers,
Malte

wunderdata
Contributor

I'm using Datagrip, but i've seen the same using exaplus cmd client:

With Auditing enabled , the audit-table shows the correct number of rows:

EXA_DB.BGAL> update ROUTE_DETAILS_RAW set charge_corridor_450 = mod(route_distance_from_start,550) > 450
[2021-11-22 17:08:17] 1,952,796,633 rows affected in 3 m 5 s 110 ms

Select stmt_id,row_count,SQL_text from exa_dba_audit_sql where stmt_id = 3;

+-------+----------+-------------------------------------------------------------------------------------------+
|STMT_ID|ROW_COUNT |SQL_TEXT |
+-------+----------+-------------------------------------------------------------------------------------------+
|3 |6247763929|update ROUTE_DETAILS_RAW set charge_corridor_450 = mod(route_distance_from_start,550) > 450|
+-------+----------+-------------------------------------------------------------------------------------------+

mwellbro
Xpert

Hmm, it´s an odd result value....since it´s Datagrip it should be JDBC based, so the ODBC 32 bit limit ( i.e.:
"
If the rowcount of a query exceeds 2147483647 (231-1), the ODBC driver will return the value 2147483647 for the function SQLRowCount(). This is because of the 32-bit limitation for this return value defined by the ODBC standard.
", https://docs.exasol.com/connect_exasol/drivers/odbc/using_odbc.htm )

isn´t at work here - the number doesn´t fit that category anyways...

Since it´s smaller than the actual table row count I don´t think that the deleted_row_percentage ( erroneously ) plays a part here either...

Could you post the result of:

-- not sure what I expect to find here, just looking for breadcrumbs...
select iproc(),count(*) from ROUTE_DETAILS_RAW group by 1 order by 1;


Probably makes sense to drop a few lines towards EXA-Support regarding this topic, I´m almost out of ideas ( currently... @exa-Chris : this isn´t a cue 😉  ).
 

wunderdata
Contributor

+-----+----------+
|IPROC|COUNT(*) |
+-----+----------+
|0 |6247763929|
+-----+----------+

Indeed it's odd, that it's not some in32 boundary but that random number

mwellbro
Xpert

Thanks for the reply...so we have:

- a single node system
- version 7.1
- audit gets it right
- exaplus & datagrip show it wrong
- jdbc usage
- no where clause in our update
- target column is a boolean and was added after table creation
- value for assignment is an expression
- after re-reading your post: any update without WHERE clause yields the wrong affected count...

Doesn´t seem overly exotic from the things involved which makes the behavior all the more erratic....
Will see if I get any good ideas over the weekend....

exa-Kristof
Team Exasol
Team Exasol

Maybe I'm missing something, but I fail to reproduce this:

DB: 7.1.2

Drivers: 7.1.2

Test szenario:

  • create a new table
  • insert 9 billion rows
  • add additional column
  • update the new column with and without expression and no filter

--> DBvisualizer (also a JDBC client) and also ExaPlus CLI both return 9.000.000.000 affected rows.

 

-- create a simple, empty table
create table test_rows_affected (id int, wert int);
-- insert 9 billion rows
insert into test_rows_affected (id) values between 1 and 9000000000;
-- check row count
Select count(*) from test_rows_affected; --> 9000000000
-- add an additional boolean column
alter table test_rows_affected add column bool_col bool;
-- update new column with expression
update test_rows_affected set bool_col = mod(wert, 3) > 1;
-- 2021-11-26 SUCCESS UPDATE 314.871 (null) 9000000000

 

exaKristof_0-1637934355924.png

Which database and driver versions are you using exactly? 

wunderdata
Contributor

So first, thanks everyone for looking into this!
The JDBC-Driver Datagrip uses is 7.0.7. 
The amount of data in the table has changed now to 3.1 billion rows. I just tested with exaplus again and it shows the correct amount now:

wunderdata_0-1638272581576.png

Also updated Datagrip to the newest version. Datagrip is now not reporting any updated rows at all anymore:
EXA_DB.BGAL> alter table ROUTE_DETAILS_RAW add column charge_corridor_test BOOLEAN
[2021-11-30 12:51:13] completed in 44 s 161 ms
EXA_DB.BGAL> update ROUTE_DETAILS_RAW set charge_corridor_test = mod(route_distance_from_start,350) > 250
[2021-11-30 12:51:28] completed in 14 s 996 ms

Select count(*) from ROUTE_DETAILS_RAW where charge_corridor_test = true;
[2021-11-30 12:52:42] 1 row retrieved starting from 1 in 430 ms (execution: 342 ms, fetching: 88 ms)

 

results to: 738719731

I created another test table with just 1 million rows. For this it returns the rows just fine. Yet on the big table again no affected rows reported:

EXA_DB.BGAL> create table ROUTE_DETAILS_RAW_test as Select * from ROUTE_DETAILS_RAW limit 1000000
[2021-11-30 13:04:14] 1,000,000 rows affected in 1 s 620 ms
EXA_DB.BGAL> update ROUTE_DETAILS_RAW_test set charge_corridor_test = mod(route_distance_from_start,350) > 250
[2021-11-30 13:05:01] 1,000,000 rows affected in 106 ms
EXA_DB.BGAL> update ROUTE_DETAILS_RAW_test set charge_corridor_test = mod(route_distance_from_start,350) > 250
[2021-11-30 13:05:03] 1,000,000 rows affected in 99 ms
EXA_DB.BGAL> update ROUTE_DETAILS_RAW set charge_corridor_test = mod(route_distance_from_start,350) > 250
[2021-11-30 13:06:40] completed in 8 s 245 ms

I could not reproduce any of this on my other instances also running 7.1 as well as 6.2.x, so it seems to be limited to this single setup here.
Maybe not worth continuing on this, if noone else can reproduce it/ reports it?

 

 

mwellbro
Xpert

Interesting - I somehow thought it might be related to 

New Feature Implementation of JDBC 4.2 getLargeMaxRows() getLargeMaxRows() will return the correct value if maxRows were set in setLargeMaxRows() or setMaxRows()

src: https://www.exasol.com/portal/display/DOWNLOAD/Version+7.1.0

and ran a couple of jdbc-traces but all came up with the correct number against a 2 Mrd. row table.

I might play around with the setups a bit longer, but good to know that it seems to be an isolated behavior.