Deploy a single-node Exasol database as a Docker image for testing purposes
This blog will show you:
How to deploy a single-node Exasol database as a Docker image for testing purposes
Before we go into the step-by-step guide, please read through the following prerequisites and recommendations to make sure that you're prepared
Currently, Exasol only supports Docker on Linux. It’s not possible to use Docker for Windows to deploy the Exasol database. The requirement for Linux OS is O_DIRECT access.
Docker installed Linux machine:
In this article, I’m going to use Centos 7.6 virtual machine with the latest version of docker (currently Version 19.03).
Docker privileged mode is required for permissions management, UDF support, and environment configuration and validation (sysctl, hugepages, block-devices, etc.).
Memory requirements for the host environment:
Each database instance needs at least 2 GiB RAM. Exasol recommends that the host reserves at least 4 GiB RAM for each running Exasol container. Since in this article I’m going to deploy a single node container I will use 6 GiB RAM for VM.
Service requirements for the host environment:
NTP should be configured on the host OS. Also, the RNG daemon must be running to provide enough entropy for the Exasol services in the container.
Exasol strongly recommends setting the CPU governor on the host to performance, to avoid serious performance problems. You can use the cpupower utility or the command below to set it.
Using cpupower utility
$ sudo cpupower -c all frequency-set -g powersave
Change the content of scaling_governor files:
$ for F in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo performance >$F; done
Exasol recommends enabling hugepages for hosts with at least 64GB RAM. To do so, we have to set the Hugepages option in EXAConf to either auto, host, or the number of hugepages per container. If we will set it to auto, the number of hugepages will be determined automatically, depending on the DB settings.
When setting it to host the nr. of hugepages from the host system will be used (i. e. /proc/sys/VM/nr_hugepages will not be changed). However, /proc/sys/VM/hugetlb_shm_group will always be set to an internal value!
It's possible to limit the resources of the Exasol container with the following docker run options:
$ docker run --cpuset-cpus="1,2,3,4" --memory=20g --memory-swap=20g --memory-reservation=10g exasol/docker-db:<version>
This is especially recommended if we need multiple Exasol containers (or other services) on the same host. In that case, we should evenly distribute the available CPUs and memory throughout your Exasol containers.
Find more detailed information here https://docs.docker.com/config/containers/resource_constraints/
How to deploy a single-node Exasol database as a Docker image
Step 1 Create a directory to store data from container persistently
To store all persistent data from the container I’m going to create a directory. I will name it “container_exa” and create it in the home folder of the Linux user.
$ mkdir $HOME/container_exa/
Set the CONTAINER_EXA variable to the folder:
$ echo ‘export CONTAINER_EXA="$HOME/container_exa/"’ >> ~/.bashrc && source ~/.bashrc
Step 2 Create a configuration file for Exasol database and docker container
The command for creating a configuration file is:
$ docker run -v "$CONTAINER_EXA":/exa --rm -i exasol/docker-db:<version> init-sc --template --num-nodes 1
Since I’m going to use the latest version of exasol (currently 6.2.6). I will use the latest tag.
Num-nodes is the number of containers. We need to change the value of this if we want to deploy a cluster.
$ docker run -v "$CONTAINER_EXA":/exa --rm -i exasol/docker-db:latest init-sc --template --num-nodes 1
NOTE: You need to add --privileged option because the host directory belongs to root.
After the command has finished, the directory $CONTAINER_EXA contains all subdirectories as well as an EXAConf template (in /etc).
Step 3 Complete a configuration file
The configuration has to be completed before the Exasol DB container can be started.
The configuration file is EXAConf and it’s stored in the “$CONTAINER_EXA/etc” folder. To be able to start a container these options have to be configured:
A private network of all nodes (Public network is not mandatory in docker version of Exasol DB)
Network port numbers
Different options can be configured in the EXAConf file. I will post articles about most of them.
1) A private network of the node
$ vim $CONTAINER_EXA/etc/EXAConf [Node : 11]
PrivateNet = 10.10.10.11/24 # <-- replace with the real network
In this case, the IP address of Linux the virtual machine is 10.1.2.4/24.
2) EXAStorage device configuration
Use the dev.1 file as an EXAStorage device for Exasol DB and mount the LVM disk to it.
3) EXAVolume configuration
Configure the volume size for Exasol DB before starting the container. There are 3 types of volumes available for Exasol.
Volumes in Exasol serve three different purposes. You can find detailed information in https://docs.exasol.com/administration/on-premise/manage_storage/volumes.htm?Highlight=volumes
Since it’s recommended to use less disk space than the size of LVM disk (because Exasol will create a temporary volume and there should be a free disk space for it) I’d recommend using 20 GiB space for volume. The actual size of the volume increases or decreases depending on the data stored.
4) Network port numbers
Since you should use the host network mode (see "Start the cluster" below), you have to adjust the port numbers used by the Exasol services. The one that's most likely to collide is the SSH daemon, which is using the well-known port 22. I’m going to change it to 2222 in EXAConf file:
The other Exasol services (e. g. Cored, BucketFS, and the DB itself) are using port numbers above 1024. However, you can change them all by editing EXAConf. In this example, I’m going to use the default ports.
Port 22 – SSH connection
Port 443 – for XMLRPC
Port 8888 – port of the Database
Port 6583 – port for bucketfs
We can define a comma-separated list of nameservers for this cluster in EXAConf under the [Global] section. Use the google DNS address 126.96.36.199.
Set the checksum within EXAConf to 'COMMIT'. This is the EXAConf integrity check (introduced in version 6.0.7-d1) that protects EXAConf from accidental changes and detects file corruption.
It can be found in the 'Global' section, near the top of the file. Please also adjust the Timezone depending on your requirements.
Step 5 Create the EXAStorage device files
EXAStorage is a distributed storage engine. All data is stored inside volumes. It also provides a failover mechanism. I’d recommend using a 32 GB LVM disk for EXAStorage:
IMPORTANT: Each device should be slightly bigger (~1%) than the required space for the volume(s) because a part of it will be reserved for metadata and checksums.
Step 5 Start the cluster
The cluster is started by creating all containers individually and passing each of them its ID from the EXAConf. Since we’ll be deploying a single node Exasol DB the node ID will be n11 and the command would be:
$ docker run --name exasol-db --detach --network=host --privileged -v $CONTAINER_EXA:/exa -v /dev/mapper/db-storage:/exa/data/storage/dev.1 exasol/docker-db:latest init-sc --node-id 11
NOTE: This example uses the host network stack, i.e. the containers are directly accessing a host interface to connect. There is no need to expose ports in this mode: they are all accessible on the host.
Let’s user the “docker logs” command to check the log files.
$ docker logs -f exasoldb
We can see 5 different stages in the logs. Stage 5 is the last and if we can see the node is online and the stage is finished this means the container and database started successfully.
$ docker container ls
Let’s get a bash shell in the container and check the status of the database and volumes
$ docker exec -it an exasol-db bash
Inside of the container, you can run some exasol specific commands to manage the database and services. You can find some of these commands below:
$ dwad_client shortlist: Gives an output about the names of the databases.
$ dwad_client list: Gives an output about the current status of the databases.
As we can see the name of the database is DB1 (this can be configured in EXAConf) and the state is running. The “Connection state: up” means we can connect to the database via port 8888.
$ csinfo -D – Print HDD info:
csinfo -v print information about one (or all) volume(s):
As we can see the size of the data volume is 20.00 GiB. You can also find information about the temporary volume in the output of the csinfo -v command.
Since the database is running and the connection state is up let’s try to connect and run for example SQL queries. You can use any SQL clients or Exaplus CLI to connect.
I’m going to use DBeaver in this article. You can find more detailed information in https://docs.exasol.com/connect_exasol/sql_clients/dbeaver.htm
I’m using the public IP address of the virtual machine and port 8888 which configured as a database port in EXAConf.
By default, the password of the sys user is “exasol”. Let's run an example query:
SELECT * FROM EXA_SYSCAT;
In this article, we deployed a single-node Exasol database in a docker container and went through the EXAConf file. In the future, I will be sharing new articles about running Exasol on docker and will analyze the EXAConf file and Exasol services in-depth.
Deploying 2+1 Exasol Cluster on Amazon Web Service (AWS)
This post will show you: How to deploy a 2+1 Exasol Cluster on Amazon Web Services (AWS) Before we go into the step-by-step guide, please read through the following prerequisites and recommendations to make sure that you're prepared
Make sure you have an AWS account with the relevant permissions. If you do not have an AWS account, you can create one from the Amazon Console.
AWS Key Pair:
You have a Key Pair created. AWS uses public-key cryptography to secure the log-in information for your instance. For more information on how to create a Key Pair, see Amazon EC2 Key Pairs in the AWS documentation.
Subscription on AWS Marketplace:
You must have subscribed to one of the following Exasol subscriptions on AWS Marketplace:
Exasol Analytic Database (Single Node / Cluster, Bring-Your-Own-License)
Exasol Analytic Database (Single Node / Cluster, Pay-As-You-Go)
How to deploy a 2+1 Exasol Cluster
Open https://cloudtools.exasol.com/ to access the cloud deployment wizard in your browser and choose your cloud provider. In this case, the Cloud Provider should be Amazon Web Services. Select your region from the drop-down list. I'm going to deploy our cluster in Frankfurt
On the Configuration screen, by default, you see the Basic Configuration page. You can choose one of the existing configurations made by Exasol.
Basic Configuration: Shows a minimum specification for your data size.
Balanced Configuration: Shows an average specification for your data size for good performance.
High-Performance Configuration: Shows the best possible specification for your data size for high performance.
In this case, I'm going to choose the Advanced Configuration option.
If you are going to deploy a cluster for production purposes we recommend discussing sizing options with the Exasol support team or use one of the existing configurations made by Exasol.
RAW Data Size (in TB):
You can add the required raw data size on your own, otherwise, it will be calculated automatically after setting Instance type and node count.
Pay as you go (PAYG)
Pay as you go (PAYG) license model is a flexible and scalable license model for Exasol's deployment on a cloud platform. In this mode, you pay for your cloud resources and Exasol software through the cloud platform's billing cycle. You can always change your setup later to scale up or down your system and the billing changes accordingly.
Bring your own license (BYOL)
Bring your own license (BYOL) license model lets you choose a static license for Exasol software and a dynamic billing for the cloud resources. In this model , you need to purchase a license from Exasol and add it to your cloud instance. This way, you pay only for the cloud resources through the cloud platform's billing cycle and there is no billing for the software. You can always change your setup later to scale up or down your system and the billing changes accordingly. However, there is a limit for the maximum scaling based on your license type (DB RAM or raw data size).
You can find detailed information about licensing in https://docs.exasol.com/administration/aws/licenses.htm
You can choose one of the Exasol Single Node and Enterprise Cluster options. I'm going to choose the Enterprise Cluster option.
You can choose one of the instance types of AWS EC2 service to deploy virtual machines for Exasol nodes. You can find detailed information about instance types of AWS EC2 in https://aws.amazon.com/ec2/instance-types/
The number of DB Nodes:
We need to determine the total number of active data nodes in this section.
After finishing the configuration we can see the RAW data size calculated automatically for us. On the left side of the screen, we can see the details of our setup on AWS.
If you have a license from Exasol please choose the BYOL option in License Model, this will cause a decrease in Estimated Costs.
After click Continue to proceed with the deployment, we can see the Summary page. We can overview the cluster configuration and choose a deployment option.
We have the option to select create new VPC or use existing VPC for the CloudFormation stack.
Create New VPC that will create a new VPC and provision of all resources within it.
Use Existing VPC will provision Exasol to use an existing VPC subnet of your choice.
For more information on VPC, see Amazon Virtual Private Cloud.
Based on this VPC selection, the parameters in the stack creation page on AWS will change when you launch the stack. For more information
on the stack parameters, see Template Parameters.
If you want to download the configuration file and upload them later to your AWS stack through CloudFormation Console, you can click the CloudFormation Templates option on the left side.
Click Launch Stack. You will be redirected to the Quick create stack page on AWS.
After redirecting to the Quick create stack page on AWSReview and I'm going to fill the required stack parameters:
Key Pair, SYS User Password, or ADMIN User Password.
In the VPC/Network/Security section, the Public IPs are set to false by default. I'm going to set this to true.
If you want to keep the Public IP address set to false, then you need to enable VPN or other methods to be able to access your instance.
(Optional) License is applicable if your subscription model is Bring-your-own-license. Paste the entire content of the license file you have in the space provided.
Click Create Stack to continue deploying Exasol in the CloudFormation Console. You can view the stack you created under AWS CloudFormation > Stacks, with the status CREATE_IN_PROGRESS . Once the stack is created successfully, the status is changed to CREATE_COMPLETE . Additionally, you can monitor the progress in the Events tab for the stack.
For more information about the stack parameters, please check the table here https://docs.exasol.com/cloud_platforms/aws/installation_cf_template.htm?Highlight=Template%20Parameters
After filling the required parameters I'm going to click Create Stack to continue deploying Exasol in the CloudFormation Console. We can view the stack created under AWS CloudFormation > Stacks, with the status CREATE_IN_PROGRESS . Once the stack is created successfully, the status is changed to CREATE_COMPLETE .
Additionally, we can monitor the progress in the Events tab for the stack.
Determine the Public IP Address
We need the Public IP or DNS name displayed in the EC2 Console to connect to the database server or launch the instance. To know the Public IP or DNS name:
Open the EC2 Dashboard from the AWS Management Console.
Click on Running Instance. The Instances page is displayed with all the running instances.
Select the name of the instance you created. (In this case exasol-cluster-management_node and exasol-cluster-management_node). We need the IP address of management node
In the Description section, the IP address displayed for Public DNS(IPv4) is the IP address of the database server.
If the Public IP parameter for your stack is set to false, you need to enable VPN or other methods to connect to the database server via the private IP address of the instances.
Access to Initialization page
Copy and paste this IP address prefixed with https in a browser. In the case of an Exasol cluster deployment, I need to copy the IP address or DNS name of the management node.
After confirming the digital certificate the following screen is displayed.
Once the installation is complete, I will be redirected to the EXAoperation screen. It may take up to 45 minutes for the EXAoperation to be online after deployment.
You can login with the admin user name and password provided while creating your stack.
Connect to the database
In this case (a 2+1 cluster deployment), I need to use the Public IP address of the data node along with the admin user name and password to connect to the SQL client. I can also connect to all the data nodes by entering the pubic IP address of all the nodes separated by a comma.
Connect to Exasol
After installing Exasol on AWS, you can do the following:
Install drivers required to connect to other tools.
Connect SQL clients to Exasol.
Connect Business Intelligence tools (BI tools) to Exasol.
Connect Data Integration - ETL tool to Exasol.
Connect Data Warehouse Automation tools to Exasol.
After you have connected your choice of tool to Exasol, you can load your data into Exasol and process further. To know more about loading data into Exasol, see Loading Data.
In this article, we deployed a 2+1 Exasol cluster on AWS. In the future, I will be sharing new articles about managing the Exasol cluster on AWS, using lambda functions to schedule the start/stop of a cluster, etc.
This article describes the calculation of the optimal (maximum) DB RAM on a:
4+1 system with one database (dedicated environment)
4+1 system with two databases (shared environment)
The calculation of the OS Memory per Node stays the same for both environments. Shared environments are not recommended for production systems.
The 4+1 cluster contains four active data nodes and one standby node. Each node has 384GiB of main memory.
How to calculate Database RAM
OS Memory per Node
It is vital for the database that there is enough memory allocatable through the OS. We recommend using at least 10% of the main memory on each node. This prevents the nodes from swapping on high load (many sessions).
Main Memory per Node * 0.1 = OS Memory per Node
384 * 0.1 = 38,4 -> 38GiB
In order to set this value, the database needs to be shut down. EXAoperation 'Configuration > Network' - "OS Memory/Node (GiB)"
Maximum DB RAM (dedicated environment)
(Main Memory per Node - OS Memory per Node) * Number of active Nodes = Maximum DB RAM
Example: 4 x data nodes with 384GiB (Main Memory per Node) - 38GiB (OS Memory per Node)
(384GiB - 38 GiB) * 4 = 1380GiB
Maximum DB RAM (shared environment)
Database "one" on four data nodes (exa_db1)
Database "two" on two data nodes (exa_db2)
As before the "Maximum DB RAM" is 1380GiB. With two databases sharing the Maximum DB RAM, we need to recalculate and redistribute it.
Maximum DB RAM / Number of Databases = Maximum DB RAM per database
1380GiB / 2 = 690GiB
For database "one" (exa_db1), which is running on all four nodes 690GiB DB RAM can be configured. The smaller database "two" (exa_db2) is running on two nodes, therefore "Maximum DB RAM per database" needs to be divided by the number of data nodes it's running on (2).
Maximum DB RAM per database / Number of active Nodes = Maximum DB RAM per database
690GiB / 2 = 345GiB
Enlarge EXAStorage disk(s) after changing disk size of the ec2 instances
To complete these steps, you need access to the AWS Management Console and have the permissions to do these actions in EXAoperation
Please ensure you have a valid backup before proceeding. The below approach works only with the cluster installation.
How to enlarge disk space in AWS
Stop all databases and stop EXAStorage in EXAoperation
Stop your EC2 instances, except the license node (ensure they don’t get terminated on shutdown; check shutdown behavior http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html)
Modify the disk on AWS console (Select Volume -> Actions -> Modify -> Enter the new size -> Click Modify)
Ensure Storage disk size is set to “Rest” <EXAoperation node setting>, if d03_storage/d04_storage is not set to "Rest", set INSTALL flag for all nodes adjust the setting and set the ACTIVE flag for all nodes, otherwise nodes will be reinstalled during boot (data loss)!
Enlarge each node device using the “Enlarge Button” in EXAoperation/EXAStorage/n00xx/h000x/