exa-Peggy
Moderator
Moderator

 

Problem

Sometimes, the disk space used by the database may continue to increase, but there was not any data added to the database. This can cause the database to use 100% of the hard drive space and not be able to expand any further.  After a restart, the disk usage is back to normal.

When running out of space, you may notice the following error message in Exaoperation:

2021-04-30 10:35:38.005042 Error cluster1 DB: demo, persistent - usage: 98.57%, free: 72.75 GiB, max: 5090.38 GiB

 

The difference between the size of the database in Exaoperation and the sum of all objects may be considerable. In this example, the database size in Exaoperation, but the sum of all objects is actually only around 1,3 TiB when we ran the below query. 

SELECT SUM( MEM_OBJECT_SIZE )/(1024*1024*1024) AS GB
FROM EXA_STATISTICS_OBJECT_SIZES;

The "missing" size is attributed to "phantom data" and is 0,7TiB in our example.

Diagnosis

To check if the problem is caused by phantom data, you have to execute the following query:

select measure_time,COMMIT_SIZE, MEM_SIZE, MULTICOPY_SIZE, COMMIT_SIZE-MEM_SIZE-MULTICOPY_SIZE PHANTOM_SIZE, 
cast((local.PHANTOM_SIZE)*100/COMMIT_SIZE as dec(7,3)) "PHANTOM_%"
from (
select cast(COMMITTED_SIZE/1024/1024 as dec(12,1)) COMMIT_SIZE, cast(MULTICOPY_DATA/1024/1024 as dec(12,1)) MULTICOPY_SIZE,
cast((MEM_OBJECT_SIZE+INDICES_MEM_SIZE+STATISTICS_SIZE)/1024/1024 as dec(12,1)) MEM_SIZE,
T.*
from "$EXA_STATS_DB_SIZE" T)
--where measure_time <insert filter on measure_time here
order by MEASURE_TIME desc;

Huge phantom percentage (>5%) for a longer periods of time is not normal.

Example:

MEASURE_TIME COMMIT_SIZE MEM_SIZE MULTICOPY_SIZE PHANTOM_SIZE PHANTOM_%
2020-10-21 09:48:52 1974.7 979.4 0.0 995.3 50.403
2020-10-21 09:30:03 1974.8 979.5 0.0 995.3 50.400
2020-10-21 09:29:00 1974.8 979.5 0.0 995.3 50.400
2020-10-21 09:27:33 998.1 979.4 0.0 18.7 1.874

 

Explanation

Exasol ensures multi-user capability through the implementation of a transaction management system (TMS). This means that requests from different users can be processed in parallel. Each transaction returns a correct result and leaves the database in a consistent state. To ensure this, the transaction must comply with ACID principles:

  • Atomicity: The transaction is either fully executed or not at all.
  • Consistency: The transaction is given the internal consistency of the database.
  • Isolation: The transaction is executed as if it is the only transaction in the system.
  • Durability: All changes to a completed transaction confirmed with COMMIT remain intact.

Phantom data can be caused by transactions left open (sessions whose last statement was not a COMMIT or ROLLBACK) for a long period of time. In the following, we'll denote the "phantom data" data that is on disk yet it is not counted in any system table.

How Phantom Data appears:

The reason why this is happening is very similar to what happens in the simple example below:

Step id Transaction 1 (TR1) Transaction 2 (TR2) Comments
1 select from T   Read-locks table T
2   Create or replace table T Create or replace first drops T and then creates new table T; Now we have two distinct tables T, on visible in TR1 and one in TR2
3   commit Version V2 of T is commited. Thus, we see this as increasing commit_data
we have V1 of T1 and V2 of T1 commited;
Result:
Now we would see this as "an increase of "PHANTOM_%"  in our PHANTOM_%-query
4 commit   TR1 commited. This TR1 does not need the old copy V1 of T.
This we can observe this in increasing the COMMIT_SIZE.
COMMIT_SIZE contains now only the size of V2 of T
Result:
Now we would see this as "an decrease of "PHANTOM_%"  in our PHANTOM_%-query

 

After step 3 (commit TR2) has been executed we have two tables T: one used by TR1 and visible to TR1 and another one which was created and committed by TR2 visible to all transactions that start after TR2 has committed; this can be easily checked by looking up the TABLE_OBJECT_ID column in EXA_ALL_TABLES in TR1 and TR2 - the OBJECT_IDs in both tables are different because TR2 dropped T and created a freshly new table named 'T'. At this point the data blocks of T used by TR1 are "phantom data" that exists on disk but is not visible in EXA_DBA_OBJECT_SIZES - doing select sum(mem_object_size) from EXA_DBA_OBJECT_SIZES considers only the blocks as seen by TR2, because that's any new transaction sees the table created by TR2 and not the original one as used in TR1.

While the example is simple, it does show the effects that transactions forgotten open may incur (if either TR1 or TR2 never commits then the "phantom data" will never be reclaimed because it is rightly seen as in use).

Where that query shows data as "phantom" is when all the data blocks are recreated in parallel transactions, because in such case we don't deal with data blocks in two versions, but instead we have one transaction which drops and creates data blocks. Sample queries to create such "phantoms" (in the above scenario replace the update statement in TR2):

  • create or replace table - we have different objects in different sessions hence different blocks
  • alter table ... partition by,
  • alter table ... distribution by,
  • recompress table ...,
  • reorganize table ... - these drop old blocks and create completely new ones due to efficiency reasons

If the last statement was not a COMMIT or ROLLBACK, any locks on objects are active. Leaving transactions opened (especially with the above statements) will likely lead to such scenarios.

Resolution Steps

Restart the database

The simplest solution is to restart the database. In the long term, this may not solve the problem if users keep forgetting to end transactions with a commit or rollback.

Check for long-running transactions with read operations

Thus, we recommend running the above on a regular basis.

If it reaches say ~ 5-10% then, you may, look for READ-transactions that are open for a long time since this is in most cases the cause of the problem (please refer to how-to-determine-idle-sessions-with-open-transactions).
A small phantom percentage (<5%) is normal, a large one is also normal but for short periods of time (if create or replace, reorganize, partition by, distribute by, recompress statements are run on large tables).

Example:

HAS_LOCKS EVALUATION SESSION_ID USER_NAME EFFECTIVE_USER STATUS COMMAND_NAME
(null) (null) 4 SYS SYS IDLE NOT SPECIFIED
NONE (null) 1681170030816657408 SYS SYS IDLE NOT SPECIFIED
READ LOCKS VERY CRITICAL 1681170037604745216 SYS SYS IDLE NOT SPECIFIED
NONE (null) 1681170049143799808 SYS SYS IDLE NOT SPECIFIED
NONE (null) 1681170070620602368 SYS SYS IDLE NOT SPECIFIED
READ LOCKS (null) 1681170092074139648 SYS SYS EXECUTE SQL SELECT
NONE (null) 1681171633878925312 SYS SYS IDLE NOT SPECIFIED
NONE (null) 1681172167982776320 SYS SYS IDLE NOT SPECIFIED

 

Look maybe at EXA_DBA_AUDIT_SQL

SELECT * FROM EXA_DBA_AUDIT_SQL
WHERE SESSION_ID = 1681170037604745216|;

Result:

SESSION_ID START_TIME STOP_TIME SUCCESS SQL_TEXT
1681170037604745216 2020-10-21 13:58:08 2020-10-21 13:58:08 true select * from test.t limit 1000

=> No commit or rollback;

SELECT * from EXA_DBA_OBJECTS
WHERE OBJECT_NAME='T'
AND  ROOT_NAME='TEST';

Result:

OBJECT_TYPE OBJECT_NAME CREATED LAST_COMMIT OWNER OBJECT_ID ROOT_NAME
TABLE T 2020-10-21 14:31:58 2020-10-21 14:33:29 SYS 428670976 TEST
 
This example depicts that an old version of Table Test.T at 2020-10-21 13:58:08 was read. This session does not have a commit or rollback and has to hold the old version of T.even though Test.T was newly created at 2020-10-21 14:31:58.

The reason for this is the ACID principle: The transaction in session 1681170037604745216 is executed as if it is the only transaction in the system. 

Comments
sabitor
Padawan

Good and interesting explanation enriched by examples - very helpful.