Deploying 2+1 Exasol Cluster on Amazon Web Service (AWS)
This post will show you: How to deploy a 2+1 Exasol Cluster on Amazon Web Services (AWS) Before we go into the step-by-step guide, please read through the following prerequisites and recommendations to make sure that you're prepared
Make sure you have an AWS account with the relevant permissions. If you do not have an AWS account, you can create one from the Amazon Console.
AWS Key Pair:
You have a Key Pair created. AWS uses public-key cryptography to secure the log-in information for your instance. For more information on how to create a Key Pair, see Amazon EC2 Key Pairs in the AWS documentation.
Subscription on AWS Marketplace:
You must have subscribed to one of the following Exasol subscriptions on AWS Marketplace:
Exasol Analytic Database (Single Node / Cluster, Bring-Your-Own-License)
Exasol Analytic Database (Single Node / Cluster, Pay-As-You-Go)
How to deploy a 2+1 Exasol Cluster
Open https://cloudtools.exasol.com/ to access the cloud deployment wizard in your browser and choose your cloud provider. In this case, the Cloud Provider should be Amazon Web Services. Select your region from the drop-down list. I'm going to deploy our cluster in Frankfurt
On the Configuration screen, by default, you see the Basic Configuration page. You can choose one of the existing configurations made by Exasol.
Basic Configuration: Shows a minimum specification for your data size.
Balanced Configuration: Shows an average specification for your data size for good performance.
High-Performance Configuration: Shows the best possible specification for your data size for high performance.
In this case, I'm going to choose the Advanced Configuration option.
If you are going to deploy a cluster for production purposes we recommend discussing sizing options with the Exasol support team or use one of the existing configurations made by Exasol.
RAW Data Size (in TB):
You can add the required raw data size on your own, otherwise, it will be calculated automatically after setting Instance type and node count.
Pay as you go (PAYG)
Pay as you go (PAYG) license model is a flexible and scalable license model for Exasol's deployment on a cloud platform. In this mode, you pay for your cloud resources and Exasol software through the cloud platform's billing cycle. You can always change your setup later to scale up or down your system and the billing changes accordingly.
Bring your own license (BYOL)
Bring your own license (BYOL) license model lets you choose a static license for Exasol software and a dynamic billing for the cloud resources. In this model , you need to purchase a license from Exasol and add it to your cloud instance. This way, you pay only for the cloud resources through the cloud platform's billing cycle and there is no billing for the software. You can always change your setup later to scale up or down your system and the billing changes accordingly. However, there is a limit for the maximum scaling based on your license type (DB RAM or raw data size).
You can find detailed information about licensing in https://docs.exasol.com/administration/aws/licenses.htm
You can choose one of the Exasol Single Node and Enterprise Cluster options. I'm going to choose the Enterprise Cluster option.
You can choose one of the instance types of AWS EC2 service to deploy virtual machines for Exasol nodes. You can find detailed information about instance types of AWS EC2 in https://aws.amazon.com/ec2/instance-types/
The number of DB Nodes:
We need to determine the total number of active data nodes in this section.
After finishing the configuration we can see the RAW data size calculated automatically for us. On the left side of the screen, we can see the details of our setup on AWS.
If you have a license from Exasol please choose the BYOL option in License Model, this will cause a decrease in Estimated Costs.
After click Continue to proceed with the deployment, we can see the Summary page. We can overview the cluster configuration and choose a deployment option.
We have the option to select create new VPC or use existing VPC for the CloudFormation stack.
Create New VPC that will create a new VPC and provision of all resources within it.
Use Existing VPC will provision Exasol to use an existing VPC subnet of your choice.
For more information on VPC, see Amazon Virtual Private Cloud.
Based on this VPC selection, the parameters in the stack creation page on AWS will change when you launch the stack. For more information
on the stack parameters, see Template Parameters.
If you want to download the configuration file and upload them later to your AWS stack through CloudFormation Console, you can click the CloudFormation Templates option on the left side.
Click Launch Stack. You will be redirected to the Quick create stack page on AWS.
After redirecting to the Quick create stack page on AWSReview and I'm going to fill the required stack parameters:
Key Pair, SYS User Password, or ADMIN User Password.
In the VPC/Network/Security section, the Public IPs are set to false by default. I'm going to set this to true.
If you want to keep the Public IP address set to false, then you need to enable VPN or other methods to be able to access your instance.
(Optional) License is applicable if your subscription model is Bring-your-own-license. Paste the entire content of the license file you have in the space provided.
Click Create Stack to continue deploying Exasol in the CloudFormation Console. You can view the stack you created under AWS CloudFormation > Stacks, with the status CREATE_IN_PROGRESS . Once the stack is created successfully, the status is changed to CREATE_COMPLETE . Additionally, you can monitor the progress in the Events tab for the stack.
For more information about the stack parameters, please check the table here https://docs.exasol.com/cloud_platforms/aws/installation_cf_template.htm?Highlight=Template%20Parameters
After filling the required parameters I'm going to click Create Stack to continue deploying Exasol in the CloudFormation Console. We can view the stack created under AWS CloudFormation > Stacks, with the status CREATE_IN_PROGRESS . Once the stack is created successfully, the status is changed to CREATE_COMPLETE .
Additionally, we can monitor the progress in the Events tab for the stack.
Determine the Public IP Address
We need the Public IP or DNS name displayed in the EC2 Console to connect to the database server or launch the instance. To know the Public IP or DNS name:
Open the EC2 Dashboard from the AWS Management Console.
Click on Running Instance. The Instances page is displayed with all the running instances.
Select the name of the instance you created. (In this case exasol-cluster-management_node and exasol-cluster-management_node). We need the IP address of management node
In the Description section, the IP address displayed for Public DNS(IPv4) is the IP address of the database server.
If the Public IP parameter for your stack is set to false, you need to enable VPN or other methods to connect to the database server via the private IP address of the instances.
Access to Initialization page
Copy and paste this IP address prefixed with https in a browser. In the case of an Exasol cluster deployment, I need to copy the IP address or DNS name of the management node.
After confirming the digital certificate the following screen is displayed.
Once the installation is complete, I will be redirected to the EXAoperation screen. It may take up to 45 minutes for the EXAoperation to be online after deployment.
You can login with the admin user name and password provided while creating your stack.
Connect to the database
In this case (a 2+1 cluster deployment), I need to use the Public IP address of the data node along with the admin user name and password to connect to the SQL client. I can also connect to all the data nodes by entering the pubic IP address of all the nodes separated by a comma.
Connect to Exasol
After installing Exasol on AWS, you can do the following:
Install drivers required to connect to other tools.
Connect SQL clients to Exasol.
Connect Business Intelligence tools (BI tools) to Exasol.
Connect Data Integration - ETL tool to Exasol.
Connect Data Warehouse Automation tools to Exasol.
After you have connected your choice of tool to Exasol, you can load your data into Exasol and process further. To know more about loading data into Exasol, see Loading Data.
In this article, we deployed a 2+1 Exasol cluster on AWS. In the future, I will be sharing new articles about managing the Exasol cluster on AWS, using lambda functions to schedule the start/stop of a cluster, etc.
This article describes the calculation of the optimal (maximum) DB RAM on a:
4+1 system with one database (dedicated environment)
4+1 system with two databases (shared environment)
The calculation of the OS Memory per Node stays the same for both environments. Shared environments are not recommended for production systems.
The 4+1 cluster contains four active data nodes and one standby node. Each node has 384GiB of main memory.
How to calculate Database RAM
OS Memory per Node
It is vital for the database that there is enough memory allocatable through the OS. We recommend using at least 10% of the main memory on each node. This prevents the nodes from swapping on high load (many sessions).
Main Memory per Node * 0.1 = OS Memory per Node
384 * 0.1 = 38,4 -> 38GiB
In order to set this value, the database needs to be shut down. EXAoperation 'Configuration > Network' - "OS Memory/Node (GiB)"
Maximum DB RAM (dedicated environment)
(Main Memory per Node - OS Memory per Node) * Number of active Nodes = Maximum DB RAM
Example: 4 x data nodes with 384GiB (Main Memory per Node) - 38GiB (OS Memory per Node)
(384GiB - 38 GiB) * 4 = 1380GiB
Maximum DB RAM (shared environment)
Database "one" on four data nodes (exa_db1)
Database "two" on two data nodes (exa_db2)
As before the "Maximum DB RAM" is 1380GiB. With two databases sharing the Maximum DB RAM, we need to recalculate and redistribute it.
Maximum DB RAM / Number of Databases = Maximum DB RAM per database
1380GiB / 2 = 690GiB
For database "one" (exa_db1), which is running on all four nodes 690GiB DB RAM can be configured. The smaller database "two" (exa_db2) is running on two nodes, therefore "Maximum DB RAM per database" needs to be divided by the number of data nodes it's running on (2).
Maximum DB RAM per database / Number of active Nodes = Maximum DB RAM per database
690GiB / 2 = 345GiB
Enlarge EXAStorage disk(s) after changing disk size of the ec2 instances
To complete these steps, you need access to the AWS Management Console and have the permissions to do these actions in EXAoperation
Please ensure you have a valid backup before proceeding. The below approach works only with the cluster installation.
How to enlarge disk space in AWS
Stop all databases and stop EXAStorage in EXAoperation
Stop your EC2 instances, except the license node (ensure they don’t get terminated on shutdown; check shutdown behavior http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html)
Modify the disk on AWS console (Select Volume -> Actions -> Modify -> Enter the new size -> Click Modify)
Ensure Storage disk size is set to “Rest” <EXAoperation node setting>, if d03_storage/d04_storage is not set to "Rest", set INSTALL flag for all nodes adjust the setting and set the ACTIVE flag for all nodes, otherwise nodes will be reinstalled during boot (data loss)!
Enlarge each node device using the “Enlarge Button” in EXAoperation/EXAStorage/n00xx/h000x/
The datadog-agent has one dependency which is '/bin/sh'. It is safe to just install it, also in regards to future updates of Exasol.
For CentOS 7.x just run on each machine (as user root):
DD_API_KEY=<Your-API-Key> bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/datadog-agent/master/cmd/agent/install_script.sh)"
The hostname can be changed in '/etc/datadog-agent/datadog.yaml', afterward, restart the agent as user root with 'systemctl restart datadog-agent'.
With versions prior to 5.0.15 EXASOL cluster deployments only supported CIDR block 22.214.171.124/16 and subnet 126.96.36.199/16, now it's possible to use custom CIDR blocks but with some restrictions, because the CIDR block will automatically be managed by our cluster operating system.
VPC CIDR block netmask must be between /16 (255.255.0.0) and /24 (255.255.255.0)
The first ten IP addresses of the cluster's subnet are reserved and cannot be used
Getting the right VPC / subnet configuration:
The subnet used for installation of the EXASOL cluster is calculated according to the VPC CIDR range:
1. For VPCs with 16 to 23 Bit netmasks, the subnet will have a 24 Bit mask. For a 24 Bit VPC, the subnet will have 26 Bit range.
VPC CIDR RANGE
2. For the EXASOL subnet, the VPS's second available subnet is automatically used. Helpful is the tool sipcalc (http://sipcalc.tools.uebi.net/), e.g.
Example 1: The VPC is 192.168.20.0/22 (255.255.252.0) -> A .../24 subnet is used (255.255.255.0). `sipcalc 192.168.20.0/24' calculates a network range of 192.168.20.0 - 192.168.20.255 which is the VPC's first subnet. => EXASOL uses the subsequent subnet, which is 192.168.21.0/24
Example 2: The VPC is 192.168.20.0/24 (255.255.255.0) -> A .../26 subnet is used (255.255.255.192). `sipcalc 192.168.20.0/26' calculates a network range of 192.168.20.0 - 192.168.20.63 which is the VPC's first subnet. => EXASOL uses the subsequent subnet, which is 192.168.20.64/26
3. The first 10 IP addresses of the subnet are reserved.
The license server, therefore, gets the subnet base + 10, the other nodes follow.
This table shows some example configurations:
VPC CIDR block
License Server IP address
IPMI network host addresses
First additional VLAN address
This article describes the process of setting up nodes with multiple EXAStorage disks in order to maximize throughput. For best performance, it is recommended that at least one 1TB GP2 SSD (3000IOPS and ~150MB/s) is being used. Please be aware that this process requires the re-installation of the cluster nodes, which implies that all data will be lost and that is why a remote backup is mandatory before proceeding.
Calculating the optimal amount of disks
Example: Instance type m4.10xlarge
Provides a maximum disk throughput of 4000mbps (~500MB/s)
Optimal disk setup 3x1TB GP2 SSDs (3x150MB/s -> 450MB/s)
The remaining 50MB/s are used for the operating system
a. Stop database instances b. Stop EXAStorage service c. Set Install flag for all data nodes by selecting all the nodes from the Nodes tab, then select Set Install Flag from the Actions tab and click Execute d. Edit Disks of each node by going to Nodes - n0011- Disks and add additional disks
Select d04_storage and click Edit
Click Add, each device requires one separate field
Fill in additional device names, e.g. “/dev/xvdd, /dev/xvde, …” and click Apply
Repeat steps for all data nodes
e. Create additional EC2 volumes (same size 1TB) using EC2 console, pay attention to the Availability Zones f. Attach volumes to the data nodes using EC2 console, use the very same device names as before in EXAoperation, e.g. “/dev/xvdd, /dev/xvde, …” g. Reboot nodes though EC2 console h. During reboot, data nodes will be reinstalled with the new disks i. Once the nodes are up and running set the Active flag by selecting all the nodes from the Nodes tab, then select Set Active Flag from the Actions tab and click Execute j. Remove EXAStorage Metadata and start EXAStorage k. In EXAStorage, add unused disks on all nodes by selecting all nodes and then clicking Add Unused Disks l. Recreate data volume(s) m. Recreate database (optional) n. Restore database backup (optional)