Corrupt database after harddisk outage - manual intervention needed

kochjoe
Padawan

Hi there,

we are using Exasol community edition 6.0.8 and now after a harddisk failure of the underlying virtualization platform we are not able to start the database again. When opening the monitoring module we see:

2020-11-11 15:45:55.377158ErrorEXAoneController(0.0): Emergency shutdown: Process 1 requested emergency shutdown. Reason: A volume has been locked. Shut down system now.
2020-11-11 15:45:55.376518ErrorEXAonepddserver(1.0): Database corrupt: Manual intervention is needed.A volume has been locked. Shut down system now.

 

Question:

Are there any means within the web console that we can accomplish this manual intervention?

Is there a way to connect with the maintenance user to see what the services and the filesystem is behaving?

An external backup is available, but without a running database we seem not able to do a restore.

 

Any help is appreciated

 

5 REPLIES 5

Charlie
Xpert

Hi,

the external backup you are talking about is an Exasol Backup or a VM Backup?
If you have got a exasol backup:

* To restore the database add a new data volume in the EXAStorage view.

Charlie_1-1605168113010.png

Be sure to add your account to the allowed users. Otherwise you will not be able  to select the volume in the next step

* In the EXASolution edit your failed database and select the newly created data volume

Charlie_2-1605168402134.png

* Save the changes and select select backups:

Charlie_3-1605168440411.png

* Select your database backup and click restore

Charlie_4-1605168757535.png

 

This will take some time and will restore the backup into the new data volume.
The old - possibly corrupt data volume - is still available and may be used to further investigate the issue.

 

If you haven't got a Exasol database backup 

Can you post a screenshot of the EXAStorage View in EXAOperation?

If you click the data volume and open the detail view. What's the status of the volume?

 

 

 

 

kochjoe
Padawan

Hi Charlie,

thanks for your answer and the detailed information. The problem is that the local storage is currently not availble as all volumes are locked and waiting for HDD recovery.

kochjoe_0-1605203348933.png

I could follow your idea and create a new backup volume, but then I don't know how to transfer the external stored backup to that volume.

Thanks, Jörg

kochjoe
Padawan

I tried to to do do a filesystem check and rebooted the node and now see this in the logs - cannot see the filesystem check beeing processed:

Log.png

Anything I can do from here?

Thx

Charlie
Xpert

Hi,

you can upload the backup files into the new archive volume either bei using ftp or curl

 

curl --ftp-ssl --insecure --user admin:<password> --upload-file <downloaded backup file> ftp://<ip>:2021/<volume-name>/<downloaded backp file>

 

Example:

 

curl --ftp-ssl --insecure --user admin:admin --upload-file metadata_202011120856 ftp://10.1.112.11:2021/v0027/EXASolo/id_1/level_0/node_0/metadata_202011120856 	

 


However I was wondering why the node status is waiting for HDD recovery.

 

As the VM is online I suspect that you were able to recover your VM environment after the outage?

 

Regards

 

 

exa-ThomasM
Moderator
Moderator

Hi Jörg, Hi Charly,

In this situation, it is saying "LOCKED (waiting for HDD recovery)". This means that the HDD has been taken offline for some reason.

EXAStorage is taking the drives down, as soon as there is a small error to prevent any sort of data loss.

Below this section of volumes, you will also see a section with the nodes and Disks, written as "[UU]":

exa-ThomasM_1-1605780391143.png

Within this case, we can see, that n0013 is missing two disks. Every "logical" disk, present to EXAStorage, is shown as one "U".

By taking this picture as an example, the volume which is using n0013 will be "LOCKED (waiting for HDD Recovery)".

In order to solve this issue, it's required to enable these disks again. Please make sure, these disks are OK before enabling it.

If you are sure, that these disks are OK, click to the node name (on the left side) and mark the disks and click on "Enable devices":

exa-ThomasM_0-1605780372259.png

This will bring the disks online again. Afterwards, the LOCK has been resolved.


Please keep in mind, that your screenshot is showing only redundancy 1, which means if the disk is broken and needs to be replaced, the data will be lost. A restore of a backup will be necessary.


Hope this answers your question.