Team Exasol
Team Exasol

Background

When you add new nodes to your database cluster (enlargement), those new nodes will not yet have any data. Furthermore, all distribution keys and indices have been invalidated on the first startup of the enlarged database. Therefore table data needs redistribution, as well as all indices, must be rebuilt.

Prerequisites

The general recommendation to accomplish this is REORGANIZE DATABASE. It redistributes tables and rebuilds all indices. Each processed table is committed immediately so transaction dependencies are minimized. 

How to estimate the duration of REORGANIZE DATABASE when enlarging a cluster

Step 1

To estimate the duration of the REORGANIZE DATABASE you can use the following SQL statement:

define OLD_NODECOUNT=4;
define NEW_NODECOUNT=8;
select
    *
from
    (
        select
            cast(sum(OVERALL_SECONDS) as dec(9)) OVERALL_SECONDS,
            cast(sum(DELETE_REORG_SECONDS) as dec(9)) DELETE_SECONDS,
            cast(sum(DISTRIBUTE_SECONDS) as dec(9)) DISTRIBUTE_SECONDS,
            cast(sum(INDEX_REBUILD_SECONDS) as dec(9)) INDEX_REBUILD_SECONDS,
            cast(sum(TABLE_SECONDS) as dec(9)) TABLE_SECONDS
        from
            (
                select
                    SCHEMA_NAME,
                    TABLE_NAME,
                    DELETE_REORG_SECONDS + DISTRIBUTE_SECONDS + INDEX_REBUILD_SECONDS + TABLE_SECONDS as OVERALL_SECONDS,
                    DELETE_REORG_SECONDS,
                    DISTRIBUTE_SECONDS,
                    INDEX_REBUILD_SECONDS,
                    TABLE_SECONDS
                from
                    (
                        select
                            coalesce(S1_SN, S2_SN, S3_SN, S4_SN) SCHEMA_NAME,
                            coalesce(S1_TN, S2_TN, S3_TN, S4_TN) TABLE_NAME,
                            zeroifnull(DELETE_REORG_SECONDS) DELETE_REORG_SECONDS,
                            zeroifnull(DISTRIBUTE_SECONDS) DISTRIBUTE_SECONDS,
                            zeroifnull(INDEX_REBUILD_SECONDS) INDEX_REBUILD_SECONDS,
                            zeroifnull(TABLE_SECONDS) TABLE_SECONDS
                        from
                                (
                                    select
                                        ROOT_NAME as S1_SN,
                                        OBJECT_NAME as S1_TN,
                                        cast(
                                            zeroifnull(
                                                sum(RAW_OBJECT_SIZE) / 1024 / 1024 / 50 /&OLD_NODECOUNT.
                                            ) as dec(18, 1)
                                        ) DELETE_REORG_SECONDS
                                    from
                                        EXA_DBA_OBJECT_SIZES
                                    where
                                        (ROOT_NAME, OBJECT_NAME) in (
                                            select
                                                TABLE_SCHEMA,
                                                TABLE_NAME
                                            from
                                                EXA_COMBINED_TABLES
                                            group by
                                                1,
                                                2
                                            having
                                                sum(DELETED_ROWS) / nullifzero(sum(GLOBAL_NUMBER_OF_ROWS)) > 0.01
                                        )
                                    group by
                                        1,
                                        2
                                ) S1
                            full outer join
                                (
                                    select
                                        ROOT_NAME as S2_SN,
                                        OBJECT_NAME as S2_TN,
                                        cast(
                                            zeroifnull(
                                                sum(RAW_OBJECT_SIZE) / 1024 / 1024 / 35 /&NEW_NODECOUNT.
                                            ) as dec(18, 1)
                                        ) DISTRIBUTE_SECONDS
                                    from
                                        EXA_DBA_OBJECT_SIZES
                                    where
                                        OBJECT_TYPE = 'TABLE'
                                    group by
                                        1,
                                        2
                                ) S2
                            on
                                S1_SN = S2_SN and
                                S1_TN = S2_TN
                            full outer join
                                (
                                    select
                                        INDEX_SCHEMA as S3_SN,
                                        INDEX_TABLE as S3_TN,
                                        cast(
                                            zeroifnull(
                                                sum(RAW_OBJECT_SIZE) / 1024 / 1024 / 60 /&NEW_NODECOUNT.
                                            ) as dec(18, 1)
                                        ) INDEX_REBUILD_SECONDS
                                    from
                                        "$EXA_INDICES"
                                    group by
                                        1,
                                        2
                                ) S3
                            on
                                coalesce(S1_SN, S2_SN)= S3_SN and
                                coalesce(S1_TN, S2_TN)= S3_TN
                            full outer join
                                (
                                    select
                                        TABLE_SCHEMA as S4_SN,
                                        TABLE_NAME as S4_TN,
                                        1.0 TABLE_SECONDS
                                    from
                                        EXA_DBA_TABLES
                                ) S4
                            on
                                coalesce(S1_SN, S2_SN, S3_SN)= S4_SN and
                                coalesce(S1_TN, S2_TN, S3_TN)= S4_TN
                    )
            )
    );

Step 2

The first column is the overall duration of the REORGANIZE DATABASE in seconds. Then follow delete reorganization time, table redistribution time and index rebuild time, all in seconds. The last column represents some general overhead as each table in the database has to be checked and committed.

OVERALL_SECONDS DELETE_SECONDS DISTRIBUTE_SECONDS INDEX_REBUILD_SECONDS TABLE_SECONDS
9515 11 3034 1000 5470

Please note, that the actual duration may differ from the computed one, because of concurrency behavior or hardware specifics.

If you have high-performance hardware (several hundred MB/s of HDD_READ_MAX and NET_MAX in EXA_MONITOR_DAILY), you may reduce the reorganize time by running three independent reorganize streams in parallel. This improves resource utilization of your database (mostly HDD_READ and NET) and speeds up the overall process.

Step 3

You can use the following SQL to generate those three REORGANIZE streams:

define OLD_NODECOUNT=4;
define NEW_NODECOUNT=8;
select
    REORGANIZE_SQL  || '     -- stream "' || STREAM || '": table estimate ' || trunc(OVERALL_SECONDS*1.5) || ' sec, stream time estimate ' || trunc(STREAM_TIME*1.5) || ' sec'
from
    (
        select
            T0.*,
            cast(
                sum(OVERALL_SECONDS) over(
                    partition by
                        STREAM
                    order by
                        INDEX_REBUILD_SECONDS+OVERALL_SECONDS asc
                ) as dec(9)
            ) as STREAM_TIME
        from
            (
                select
                    T1.*,
                    case
                        when
                            PART_OVERALL / NULLIFZERO(SUM_OVERALL) < 0.37
                        then
                            'small tables'
                        when
                            PART_OVERALL / NULLIFZERO(SUM_OVERALL) > 0.66
                        then
                            'indices'
                        else
                            'big tables'
                    end as STREAM
                from
                    (
                        select
                            T2.*,
                            cast(
                                sum(OVERALL_SECONDS) over(
                                    order by
                                        INDEX_REBUILD_SECONDS+OVERALL_SECONDS asc
                                ) as dec(10, 1)
                            ) as PART_OVERALL,
                            cast(sum(OVERALL_SECONDS) over() as dec(10, 1)) as SUM_OVERALL
                        from
                            (
                                select
                                    schema_name,
                                    table_name,
                                    'REORGANIZE TABLE "' || schema_name || '"."' || table_name || '";' as REORGANIZE_SQL,
                                    cast(sum(OVERALL_SECONDS) as dec(10, 1))       OVERALL_SECONDS,
                                    cast(sum(DELETE_REORG_SECONDS) as dec(10, 1))  DELETE_SECONDS,
                                    cast(sum(DISTRIBUTE_SECONDS) as dec(10, 1))    DISTRIBUTE_SECONDS,
                                    cast(sum(INDEX_REBUILD_SECONDS) as dec(10, 1)) INDEX_REBUILD_SECONDS,
                                    cast(sum(TABLE_SECONDS) as dec(10, 1))         TABLE_SECONDS
                                from
                                    (
                                        select
                                            SCHEMA_NAME,
                                            TABLE_NAME,
                                            DELETE_REORG_SECONDS + DISTRIBUTE_SECONDS + INDEX_REBUILD_SECONDS + TABLE_SECONDS as OVERALL_SECONDS,
                                            DELETE_REORG_SECONDS,
                                            DISTRIBUTE_SECONDS,
                                            INDEX_REBUILD_SECONDS,
                                            TABLE_SECONDS
                                        from
                                            (
                                                select
                                                    coalesce(S1_SN, S2_SN, S3_SN, S4_SN) SCHEMA_NAME,
                                                    coalesce(S1_TN, S2_TN, S3_TN, S4_TN) TABLE_NAME,
                                                    zeroifnull(DELETE_REORG_SECONDS)     DELETE_REORG_SECONDS,
                                                    zeroifnull(DISTRIBUTE_SECONDS)       DISTRIBUTE_SECONDS,
                                                    zeroifnull(INDEX_REBUILD_SECONDS)    INDEX_REBUILD_SECONDS,
                                                    zeroifnull(TABLE_SECONDS)            TABLE_SECONDS
                                                from
                                                        (
                                                            select
                                                                ROOT_NAME as S1_SN,
                                                                OBJECT_NAME as S1_TN,
                                                                cast(
                                                                    zeroifnull(
                                                                        sum(RAW_OBJECT_SIZE) / 1024 / 1024 / 50 /&OLD_NODECOUNT.
                                                                    ) as dec(18, 1)
                                                                ) DELETE_REORG_SECONDS
                                                            from
                                                                EXA_DBA_OBJECT_SIZES
                                                            where
                                                                (ROOT_NAME, OBJECT_NAME) in (
                                                                    select
                                                                        TABLE_SCHEMA,
                                                                        TABLE_NAME
                                                                    from
                                                                        EXA_COMBINED_TABLES
                                                                    group by
                                                                        1,
                                                                        2
                                                                    having
                                                                        sum(DELETED_ROWS) / nullifzero(sum(GLOBAL_NUMBER_OF_ROWS)) > 0.01
                                                                )
                                                            group by
                                                                1,
                                                                2
                                                        ) S1
                                                    full outer join
                                                        (
                                                            select
                                                                ROOT_NAME as S2_SN,
                                                                OBJECT_NAME as S2_TN,
                                                                cast(
                                                                    zeroifnull(
                                                                        sum(RAW_OBJECT_SIZE) / 1024 / 1024 / 35 /&NEW_NODECOUNT.
                                                                    ) as dec(18, 1)
                                                                ) DISTRIBUTE_SECONDS
                                                            from
                                                                EXA_DBA_OBJECT_SIZES
                                                            where 
                                                                OBJECT_TYPE = 'TABLE'
                                                            group by
                                                                1,
                                                                2
                                                        ) S2
                                                    on
                                                        S1_SN = S2_SN and
                                                        S1_TN = S2_TN
                                                    full outer join
                                                        (
                                                            select
                                                                INDEX_SCHEMA as S3_SN,
                                                                INDEX_TABLE as S3_TN,
                                                                cast(
                                                                    zeroifnull(
                                                                        sum(RAW_OBJECT_SIZE) / 1024 / 1024 / 60 /&NEW_NODECOUNT.
                                                                    ) as dec(18, 1)
                                                                ) INDEX_REBUILD_SECONDS
                                                            from
                                                                "$EXA_INDICES"
                                                            group by
                                                                1,
                                                                2
                                                        ) S3
                                                    on
                                                        coalesce(S1_SN, S2_SN)= S3_SN and
                                                        coalesce(S1_TN, S2_TN)= S3_TN
                                                    full outer join
                                                        (
                                                            select
                                                                TABLE_SCHEMA as S4_SN,
                                                                TABLE_NAME as S4_TN,
                                                                1.0 as TABLE_SECONDS
                                                            from
                                                                EXA_DBA_TABLES
                                                        ) S4
                                                    on
                                                        coalesce(S1_SN, S2_SN, S3_SN)= S4_SN and
                                                        coalesce(S1_TN, S2_TN, S3_TN)= S4_TN
                                            )
                                    )
                                group by
                                    1,
                                    2,
                                    3
                            ) T2
                    ) T1
            ) T0
    )
order by
    STREAM,
    STREAM_TIME;

Please ensure to run the above statements right before enlarging the cluster as rebuild times of invalidated indices cannot be estimated thereafter.

Additional Notes

The generated SQL statements for single-table REORGANIZEs should be split into three streams "small tables", "big tables" and "indices" which will have all comparable overall stream durations. The REORGANIZE TABLE comments could help you to check overall progress (stream time estimate) and estimate the overall finishing time using EXA_DBA_SESSIONS or EXA_DBA_PROFILE_RUNNING.

...
REORGANIZE TABLE "TPCDS"."WEB_RETURNS";                                                        -- stream "big tables": table estimate 40 sec, stream time estimate 2679 sec                 
REORGANIZE TABLE "TPC"."CUSTOMER";                                                             -- stream "big tables": table estimate 90 sec, stream time estimate 2769 sec                 
...
REORGANIZE TABLE "TPCDS"."CATALOG_SALES";                                                      -- stream "indices": table estimate 105 sec, stream time estimate 1367 sec                  
REORGANIZE TABLE "TPCDS"."STORE_SALES";                                                        -- stream "indices": table estimate 134 sec, stream time estimate 1502 sec                  
REORGANIZE TABLE "TPC"."LINEITEM";                                                             -- stream "indices": table estimate 239 sec, stream time estimate 1741 sec                  
...
REORGANIZE TABLE "TPCDS"."HOUSEHOLD_DEMOGRAPHICS";                                             -- stream "small tables": table estimate 1 sec, stream time estimate 1554 sec               
REORGANIZE TABLE "TPCDS"."CATALOG_PAGE";                                                       -- stream "small tables": table estimate 1 sec, stream time estimate 1555 sec               
...

We do not recommend using more than three streams as index rebuild performance will deteriorate strongly if DBRAM gets heavily under stress by parallel big index rebuilds or big table redistributions.

As long as the REORGANIZE is not finished, we recommend avoiding running queries that access large tables and therefore may try to create big indices. As the creation of big indices needs a lot of memory, several index creations in parallel can easily interfere with each other (swapping). This restriction does not apply to DML statements as they normally don't create indices. Index maintenance is a part of DML, but won't be executed on invalidated indices.