Deploying 2+1 Exasol Cluster on Amazon Web Service (AWS)
This post will show you: How to deploy a 2+1 Exasol Cluster on Amazon Web Services (AWS) Before we go into the step-by-step guide, please read through the following prerequisites and recommendations to make sure that you're prepared
Make sure you have an AWS account with the relevant permissions. If you do not have an AWS account, you can create one from the Amazon Console.
AWS Key Pair:
You have a Key Pair created. AWS uses public-key cryptography to secure the log-in information for your instance. For more information on how to create a Key Pair, see Amazon EC2 Key Pairs in the AWS documentation.
Subscription on AWS Marketplace:
You must have subscribed to one of the following Exasol subscriptions on AWS Marketplace:
Exasol Analytic Database (Single Node / Cluster, Bring-Your-Own-License)
Exasol Analytic Database (Single Node / Cluster, Pay-As-You-Go)
How to deploy a 2+1 Exasol Cluster
Open https://cloudtools.exasol.com/ to access the cloud deployment wizard in your browser and choose your cloud provider. In this case, the Cloud Provider should be Amazon Web Services. Select your region from the drop-down list. I'm going to deploy our cluster in Frankfurt
On the Configuration screen, by default, you see the Basic Configuration page. You can choose one of the existing configurations made by Exasol.
Basic Configuration: Shows a minimum specification for your data size.
Balanced Configuration: Shows an average specification for your data size for good performance.
High-Performance Configuration: Shows the best possible specification for your data size for high performance.
In this case, I'm going to choose the Advanced Configuration option.
If you are going to deploy a cluster for production purposes we recommend discussing sizing options with the Exasol support team or use one of the existing configurations made by Exasol.
RAW Data Size (in TB):
You can add the required raw data size on your own, otherwise, it will be calculated automatically after setting Instance type and node count.
Pay as you go (PAYG)
Pay as you go (PAYG) license model is a flexible and scalable license model for Exasol's deployment on a cloud platform. In this mode, you pay for your cloud resources and Exasol software through the cloud platform's billing cycle. You can always change your setup later to scale up or down your system and the billing changes accordingly.
Bring your own license (BYOL)
Bring your own license (BYOL) license model lets you choose a static license for Exasol software and a dynamic billing for the cloud resources. In this model , you need to purchase a license from Exasol and add it to your cloud instance. This way, you pay only for the cloud resources through the cloud platform's billing cycle and there is no billing for the software. You can always change your setup later to scale up or down your system and the billing changes accordingly. However, there is a limit for the maximum scaling based on your license type (DB RAM or raw data size).
You can find detailed information about licensing in https://docs.exasol.com/administration/aws/licenses.htm
You can choose one of the Exasol Single Node and Enterprise Cluster options. I'm going to choose the Enterprise Cluster option.
You can choose one of the instance types of AWS EC2 service to deploy virtual machines for Exasol nodes. You can find detailed information about instance types of AWS EC2 in https://aws.amazon.com/ec2/instance-types/
The number of DB Nodes:
We need to determine the total number of active data nodes in this section.
After finishing the configuration we can see the RAW data size calculated automatically for us. On the left side of the screen, we can see the details of our setup on AWS.
If you have a license from Exasol please choose the BYOL option in License Model, this will cause a decrease in Estimated Costs.
After click Continue to proceed with the deployment, we can see the Summary page. We can overview the cluster configuration and choose a deployment option.
We have the option to select create new VPC or use existing VPC for the CloudFormation stack.
Create New VPC that will create a new VPC and provision of all resources within it.
Use Existing VPC will provision Exasol to use an existing VPC subnet of your choice.
For more information on VPC, see Amazon Virtual Private Cloud.
Based on this VPC selection, the parameters in the stack creation page on AWS will change when you launch the stack. For more information
on the stack parameters, see Template Parameters.
If you want to download the configuration file and upload them later to your AWS stack through CloudFormation Console, you can click the CloudFormation Templates option on the left side.
Click Launch Stack. You will be redirected to the Quick create stack page on AWS.
After redirecting to the Quick create stack page on AWSReview and I'm going to fill the required stack parameters:
Key Pair, SYS User Password, or ADMIN User Password.
In the VPC/Network/Security section, the Public IPs are set to false by default. I'm going to set this to true.
If you want to keep the Public IP address set to false, then you need to enable VPN or other methods to be able to access your instance.
(Optional) License is applicable if your subscription model is Bring-your-own-license. Paste the entire content of the license file you have in the space provided.
Click Create Stack to continue deploying Exasol in the CloudFormation Console. You can view the stack you created under AWS CloudFormation > Stacks, with the status CREATE_IN_PROGRESS . Once the stack is created successfully, the status is changed to CREATE_COMPLETE . Additionally, you can monitor the progress in the Events tab for the stack.
For more information about the stack parameters, please check the table here https://docs.exasol.com/cloud_platforms/aws/installation_cf_template.htm?Highlight=Template%20Parameters
After filling the required parameters I'm going to click Create Stack to continue deploying Exasol in the CloudFormation Console. We can view the stack created under AWS CloudFormation > Stacks, with the status CREATE_IN_PROGRESS . Once the stack is created successfully, the status is changed to CREATE_COMPLETE .
Additionally, we can monitor the progress in the Events tab for the stack.
Determine the Public IP Address
We need the Public IP or DNS name displayed in the EC2 Console to connect to the database server or launch the instance. To know the Public IP or DNS name:
Open the EC2 Dashboard from the AWS Management Console.
Click on Running Instance. The Instances page is displayed with all the running instances.
Select the name of the instance you created. (In this case exasol-cluster-management_node and exasol-cluster-management_node). We need the IP address of management node
In the Description section, the IP address displayed for Public DNS(IPv4) is the IP address of the database server.
If the Public IP parameter for your stack is set to false, you need to enable VPN or other methods to connect to the database server via the private IP address of the instances.
Access to Initialization page
Copy and paste this IP address prefixed with https in a browser. In the case of an Exasol cluster deployment, I need to copy the IP address or DNS name of the management node.
After confirming the digital certificate the following screen is displayed.
Once the installation is complete, I will be redirected to the EXAoperation screen. It may take up to 45 minutes for the EXAoperation to be online after deployment.
You can login with the admin user name and password provided while creating your stack.
Connect to the database
In this case (a 2+1 cluster deployment), I need to use the Public IP address of the data node along with the admin user name and password to connect to the SQL client. I can also connect to all the data nodes by entering the pubic IP address of all the nodes separated by a comma.
Connect to Exasol
After installing Exasol on AWS, you can do the following:
Install drivers required to connect to other tools.
Connect SQL clients to Exasol.
Connect Business Intelligence tools (BI tools) to Exasol.
Connect Data Integration - ETL tool to Exasol.
Connect Data Warehouse Automation tools to Exasol.
After you have connected your choice of tool to Exasol, you can load your data into Exasol and process further. To know more about loading data into Exasol, see Loading Data.
In this article, we deployed a 2+1 Exasol cluster on AWS. In the future, I will be sharing new articles about managing the Exasol cluster on AWS, using lambda functions to schedule the start/stop of a cluster, etc.
This article describes the calculation of the optimal (maximum) DB RAM on a:
4+1 system with one database (dedicated environment)
4+1 system with two databases (shared environment)
The calculation of the OS Memory per Node stays the same for both environments. Shared environments are not recommended for production systems.
The 4+1 cluster contains four active data nodes and one standby node. Each node has 384GiB of main memory.
How to calculate Database RAM
OS Memory per Node
It is vital for the database that there is enough memory allocatable through the OS. We recommend using at least 10% of the main memory on each node. This prevents the nodes from swapping on high load (many sessions).
Main Memory per Node * 0.1 = OS Memory per Node
384 * 0.1 = 38,4 -> 38GiB
In order to set this value, the database needs to be shut down. EXAoperation 'Configuration > Network' - "OS Memory/Node (GiB)"
Maximum DB RAM (dedicated environment)
(Main Memory per Node - OS Memory per Node) * Number of active Nodes = Maximum DB RAM
Example: 4 x data nodes with 384GiB (Main Memory per Node) - 38GiB (OS Memory per Node)
(384GiB - 38 GiB) * 4 = 1380GiB
Maximum DB RAM (shared environment)
Database "one" on four data nodes (exa_db1)
Database "two" on two data nodes (exa_db2)
As before the "Maximum DB RAM" is 1380GiB. With two databases sharing the Maximum DB RAM, we need to recalculate and redistribute it.
Maximum DB RAM / Number of Databases = Maximum DB RAM per database
1380GiB / 2 = 690GiB
For database "one" (exa_db1), which is running on all four nodes 690GiB DB RAM can be configured. The smaller database "two" (exa_db2) is running on two nodes, therefore "Maximum DB RAM per database" needs to be divided by the number of data nodes it's running on (2).
Maximum DB RAM per database / Number of active Nodes = Maximum DB RAM per database
690GiB / 2 = 345GiB
In this article, you can find how to calculate the available database disk space.
To calculate the available database disk space we need some informations first:
available disk sizes on nodes
how many volumes does exist
the size of these volumes
nodes are used by these volumes
the redundancy of these volumes
Let's explain the calculation with an example:
available disk space on the "d03_storage" partition on all nodes:
Available Disk Size (GiB)
existing volumes, sizes, and redundancies:
Calculation of the free disk space
The first step is to divide the required size of the volumes by the number of used nodes to get the segment size (example for v0000):
Size / Number of Nodes = Segment Size 1024 GiB / 3 Nodes = 341.3 GiB/Node
Next step is to multiply the segment size by the redundancy of the volume:
Segment Size * Redundancy = Used Disk Space per Node 341.3 GiB/Node * 2 = 682.6 GiB/Node
This has to be done for every volume. After that we're able to fill a table with the used disk space per node like this:
Now we can simply substract the used sizes from the available disk size per node:
n0011: 1786 GiB - 213 GiB - 20 GiB = 1553 GiB
n0012: 1786 GiB - 683 GiB - 213 GiB - 20 GiB = 870 GiB
n0013: 1786 GiB - 683 GiB - 213 GiB - 20 GiB - 120 GiB - 7 GiB = 743 GiB
n0014: 1786 GiB - 683 GiB - 120 GiB - 7 GiB = 976 GiB
The minimum value over all nodes gives us the free available space: 743 GiB with a redundancy of 1. The reason for the minimum is that all segments of a volume need to have the same size.
Calculation of the available space in point of view of the database instance
The database instance is able to control the size of its own data volume: data volumes can grow and they can be shrunken. Shrinking of a data volume is an expensive operation and creates a high amount of disk and network usage. To limit the usage the process will only shrink a few blocks after a defined amount of COMMIT statements. That is the reason why data volumes won't shrink immediately when data in the database has been deleted.
This results in the data volume usually aren't used completely by the database and there is an amount of free space:
Database Volumes v0001 + v0002
Used 200 GiB
Unused 120 GiB
2 * 120 GiB = 240 GiB Free
2 * 200 GiB = 400 GiB Used
Used 30 GiB
Unused 20 GiB
1 * 20 GiB = 20 GiB Free
1 * 30 GiB = 30 GiB Used
260 GiB Free
430 GiB Used
Now we can calculate the available space for the database which is using the volumes v0001 and v0002:
Free = available space for volumes + available space inside the DB volume
Free = 743 GiB + 260 GiB = 1003 GiB Free (with a redundancy of 1)
Usage = (1 - (free space / (free space + used space))) * 100%
Usage = (1 - (1003 GiB / (1003 GiB + 430 GiB))) * 100% = 30%
How to get the necessary data for monitoring the free space
To monitor the free space of an EXASolution database instance we need the following information:
the available disk space of the storage partition
all EXAStroage volumes
all sizes of these volumes
the redundancy of this volume
the data volumes used by the database instance we want to check (data + temp)
the usage of those data volumes
All those data are provided by the EXAoperation XMLRPC interface since EXASuite 4.2. You can use the following functions:
information about the available space of the storage partition
volumes and its usages used by the database
volume sizes and redundancies
Please check the EXAoperation user manual for a full description of how to use those functions. You can find this manual on our user portal: https://www.exasol.com/portal/display/DOWNLOAD/6.0
The datadog-agent has one dependency which is '/bin/sh'. It is safe to just install it, also in regards to future updates of Exasol.
For CentOS 7.x just run on each machine (as user root):
DD_API_KEY=<Your-API-Key> bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/datadog-agent/master/cmd/agent/install_script.sh)"
The hostname can be changed in '/etc/datadog-agent/datadog.yaml', afterward, restart the agent as user root with 'systemctl restart datadog-agent'.
With versions prior to 5.0.15 EXASOL cluster deployments only supported CIDR block 126.96.36.199/16 and subnet 188.8.131.52/16, now it's possible to use custom CIDR blocks but with some restrictions, because the CIDR block will automatically be managed by our cluster operating system.
VPC CIDR block netmask must be between /16 (255.255.0.0) and /24 (255.255.255.0)
The first ten IP addresses of the cluster's subnet are reserved and cannot be used
Getting the right VPC / subnet configuration:
The subnet used for installation of the EXASOL cluster is calculated according to the VPC CIDR range:
1. For VPCs with 16 to 23 Bit netmasks, the subnet will have a 24 Bit mask. For a 24 Bit VPC, the subnet will have 26 Bit range.
VPC CIDR RANGE
2. For the EXASOL subnet, the VPS's second available subnet is automatically used. Helpful is the tool sipcalc (http://sipcalc.tools.uebi.net/), e.g.
Example 1: The VPC is 192.168.20.0/22 (255.255.252.0) -> A .../24 subnet is used (255.255.255.0). `sipcalc 192.168.20.0/24' calculates a network range of 192.168.20.0 - 192.168.20.255 which is the VPC's first subnet. => EXASOL uses the subsequent subnet, which is 192.168.21.0/24
Example 2: The VPC is 192.168.20.0/24 (255.255.255.0) -> A .../26 subnet is used (255.255.255.192). `sipcalc 192.168.20.0/26' calculates a network range of 192.168.20.0 - 192.168.20.63 which is the VPC's first subnet. => EXASOL uses the subsequent subnet, which is 192.168.20.64/26
3. The first 10 IP addresses of the subnet are reserved.
The license server, therefore, gets the subnet base + 10, the other nodes follow.
This table shows some example configurations:
VPC CIDR block
License Server IP address
IPMI network host addresses
First additional VLAN address
This article explains how to activate a new license.
Scenario: License Upgrade with DB RAM expansion
The valid license file (XML)
Short Downtime to stop and start the database
EXAoperation User with privilege Level "Master"
Step 1: Upload License file to EXAoperation
In EXAoperation navigate to "Software"
On the software page, click on the "License" Tab
Click on the "Browse" button to open a file upload dialog. Select the new license file and confirm by clicking the "Upload" button
Refresh the "License" page and review new license information
Step 2: Stop all databases
Click on left navigation pane "EXASolution"
Select all checkboxes of the listed database instances
Click on the "Shutdown" button and wait for all database instances to shut down (Monitoring->Logservice)
Step 3: Adjust DB RAM (optional)
Click on the DB name
Click on "Edit"
Adjust "DB RAM (GiB)" according to your license and click "Apply"
Step 4: Start all databases
Click on left navigation pane "EXASolution"
Select all checkboxes of the listed database instances
Start all databases and wait for all instances to be up and running (Monitoring->Logservice)
This article describes how to improve the speed of your SMB share with disabling the policy "Microsoft network server: Digitally sign communications (always)
creating backups takes unusual long
performance of the remote archive volumes are poor (only a few MiB/s)
remote share is a Microsoft Windows server
no performance problems by using "smbclient" on other Linux clients
Open the "Local Group Policy Editor" on your Windows server and goto "Windows Settings > Security Settings > Local Policies > Security Option". To improve the speed of your share you have to disable the policy "Microsoft network server: Digitally sing communications (always)". After changing the policy you should be able to read and write with normal speed again.
This article describes the process of setting up nodes with multiple EXAStorage disks in order to maximize throughput. For best performance, it is recommended that at least one 1TB GP2 SSD (3000IOPS and ~150MB/s) is being used. Please be aware that this process requires the re-installation of the cluster nodes, which implies that all data will be lost and that is why a remote backup is mandatory before proceeding.
Calculating the optimal amount of disks
Example: Instance type m4.10xlarge
Provides a maximum disk throughput of 4000mbps (~500MB/s)
Optimal disk setup 3x1TB GP2 SSDs (3x150MB/s -> 450MB/s)
The remaining 50MB/s are used for the operating system
a. Stop database instances b. Stop EXAStorage service c. Set Install flag for all data nodes by selecting all the nodes from the Nodes tab, then select Set Install Flag from the Actions tab and click Execute d. Edit Disks of each node by going to Nodes - n0011- Disks and add additional disks
Select d04_storage and click Edit
Click Add, each device requires one separate field
Fill in additional device names, e.g. “/dev/xvdd, /dev/xvde, …” and click Apply
Repeat steps for all data nodes
e. Create additional EC2 volumes (same size 1TB) using EC2 console, pay attention to the Availability Zones f. Attach volumes to the data nodes using EC2 console, use the very same device names as before in EXAoperation, e.g. “/dev/xvdd, /dev/xvde, …” g. Reboot nodes though EC2 console h. During reboot, data nodes will be reinstalled with the new disks i. Once the nodes are up and running set the Active flag by selecting all the nodes from the Nodes tab, then select Set Active Flag from the Actions tab and click Execute j. Remove EXAStorage Metadata and start EXAStorage k. In EXAStorage, add unused disks on all nodes by selecting all nodes and then clicking Add Unused Disks l. Recreate data volume(s) m. Recreate database (optional) n. Restore database backup (optional)