Remote Node Management for Exasol on AWS

mmzyk
Contributor

Hello Exasol Community,

Despite of many attempts, we are unable to implement a manual, remote/command-line based way of starting and stopping Exasol Nodes as an alternative to Cloud UI / Cloud Plug-in for our Exasol platform (Cloud-based Exasol on AWS, version 6.2.6). The main requirement is to avoid the use of Cloud UI / Cloud Plug-in for the >30 users of our training platform, as Cloud UI contains further administration options such as "Scale Up" or "Scale Out" that should not become visible/available to the mayority of users.

Our preferred solution design would be a Bot programmed in Amazon Lex (which then can be called from external applications such as Slack) containing (among other commands) two "Intents", each calling an AWS Lambda function (programmed in Python, or any other programming language):

- startnodes --> calling an AWS Lambda function which remotely starts up all nodes (start nodes/log into Exaoperation/start storage/start database)

- stopnodes --> calling an AWS Lambda function for shutting down all nodes (shut down database, storage, data nodes)

Exasol already provided documentation for a command-line-based way of remotely / manually starting and shutting down the nodes, as described under:

https://docs.exasol.com/administration/aws/nodes/stop_start_nodes.htm#Stop_Start_Using_Cloud_plugin

According to this documentation, these commands would have to be executed from the license server:

-startnodes:   /opt/ui-backend/handle_cloudui_request.py -d '{"method":"start_cluster","credentials":"'$(echo -n 'admin:EXAOPERATION_PASSWORD' | base64)'"}'

-stopnodes:  /opt/ui-backend/handle_cloudui_request.py -d '{"method":"stop_cluster","credentials":"'$(echo -n 'admin:EXAOPERATION_PASSWORD' | base64)'"}'

However, we lack the programming knowledge to execute / call this Python code directly from an AWS Lambda function, which also would have to establish the connection to the license server (possibly requiring further Python packages) before even being able to execute this code.

Any help for implementing our requirement, or any sharing of own experiences / implementations for similar solutions that remotely start/stop all Exasol nodes would be greatly appreciated. In case you can sketch or provide a solution based on SSH clients such as Putty (i.e., not using Amazon Lex / Lambda and instead making users call a script via Putty that starts/stops nodes), this would equally help us a lot.

Thanks in advance & Best regards,

Michael

 

PD: We also tried to use the XML-RPC interface as described under:

https://community.exasol.com/t5/environment-management/starting-and-stopping-clusters-using-xml-rpc/...

or

https://github.com/exasol/exaoperation-xmlrpc/tree/master/tools-and-examples/startstop

However, we ran into different Python issues related to authentication, deprecated functions etc. and even suffered node failure as a result of those scripts. 

1 ACCEPTED SOLUTION

Accepted Solutions

exa-Fagani
Team Exasol
Team Exasol

Hi Michael,

We have a CloudFormation template that can be used to schedule the start/stop of the exasol cluster on AWS via Lambda functions based on CloudWatch events. Please find the CFN template below:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'AWS CloudFormation Template - Shutdown/Restart all cluster nodes via AWS Lambda Function and Cloudwatch Event'
Parameters:
  ExaoperationUser:
    Description: ExaOperation User
    Type: String
    ConstraintDescription: must be valid user
  ExaoperationPassword:
    Description: Password for ExaOperation User
    Type: String
    NoEcho: true
    ConstraintDescription: must be a valid password
  VPCId:
    Type: AWS::EC2::VPC::Id    
  SubnetId:
    Type: AWS::EC2::Subnet::Id
  SecurityGroupId:
    Type: AWS::EC2::SecurityGroup::Id
  ScheduleShutDown:
    Type: String
    Description: "cron(30 09 * * ? *)"
  ScheduleStartUp:
    Type: String
    Description: "cron(45 09 * * ? *)"  
  MgmtNodeIp:
    Type: String
    Description: Private Ip Address or DNS name of the management node
Resources:
  ScheduledRuleShutDown: 
    Type: AWS::Events::Rule
    Properties: 
      Description: Scheduled rule for shutting down the cluster
      ScheduleExpression: 
        Ref: ScheduleShutDown
      State: ENABLED
      Targets: 
        - 
          Arn: 
            Fn::GetAtt: 
              - LambdaFunctionShutDown
              - Arn
          Id: ShutDown
  PermissionForEventsToInvokeLambdaFunctionShutDown: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: LambdaFunctionShutDown
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: 
        Fn::GetAtt: 
          - ScheduledRuleShutDown
          - Arn
  ScheduledRuleStartUp: 
    Type: AWS::Events::Rule
    Properties: 
      Description: Scheduled rule starting up the cluster
      ScheduleExpression: 
        Ref: ScheduleStartUp
      State: ENABLED
      Targets: 
        - 
          Arn: 
            Fn::GetAtt: 
              - LambdaFunctionStartUp
              - Arn
          Id: StartUp
  PermissionForEventsToInvokeLambdaFunctionStartUp: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: LambdaFunctionStartUp
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: 
        Fn::GetAtt: 
          - ScheduledRuleStartUp
          - Arn
  HelperRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: /
      ManagedPolicyArns:
      - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
      - PolicyName: LambdaPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - "ec2:CreateNetworkInterface"
            - "ec2:DescribeNetworkInterfaces"
            - "ec2:DeleteNetworkInterface"
            Resource: "*"
          - Effect: Allow
            Action:
            - "ec2:StopInstances"
            - "ec2:StartInstances"
            Resource: 
              -  !Join [ '', [ 'arn:aws:ec2:', !Ref 'AWS::Region', ':', !Ref 'AWS::AccountId', ':instance/*'] ]
          - Effect: Allow
            Action:
            - "ssm:GetParameter"
            Resource: 
            - !Join [ '', [ 'arn:aws:ssm:', !Ref 'AWS::Region', ':',!Ref 'AWS::AccountId', ':parameter/', !Join [ '', [ 'exasol-timed-shutdown-restart-', !Ref 'AWS::StackName']] ]]             
  LambdaFunctionShutDown: 
    Type: AWS::Lambda::Function
    Properties: 
      Handler: index.handler
      Role: 
        Fn::GetAtt: 
          - HelperRole
          - Arn
      Runtime: python3.6
      Timeout: 900
      VpcConfig:
        SecurityGroupIds:
          - Ref: SecurityGroupId
        SubnetIds:
          - Ref: SubnetId
      Environment:
            Variables:
              ExaoperationUser: !Ref ExaoperationUser
              SSMParameter: !Join [ '', [ 'exasol-timed-shutdown-restart-', !Ref 'AWS::StackName']]
              MgmtNodeIp: !Ref MgmtNodeIp
      Code:
        ZipFile: |
          import xmlrpc.client
          import ssl
          import base64
          import os
          import time
          import boto3

          exaoperation_password = boto3.client('ssm').get_parameter(Name=os.environ['SSMParameter'])['Parameter']['Value']

          def handler(event, context):
            cleartext_pw = os.environ['ExaoperationUser'] + ':' + exaoperation_password
            credentials = base64.b64encode(cleartext_pw.encode('utf-8')).decode('utf-8')
            cluster_service = xmlrpc.client.ServerProxy('https://' + cleartext_pw + "@" + os.environ['MgmtNodeIp'] + '/cluster1', allow_none=True, context=ssl._create_unverified_context())
            response = cluster_service.callPlugin('Cloud.UIBackend-1.1.4', 'n0010', 'CLOUDUI_REQUEST', '{"method":"stop_cluster","credentials":"' + credentials + '"}')
            return response
  LambdaFunctionStartUp: 
    Type: AWS::Lambda::Function
    Properties: 
      Handler: index.handler
      Role: 
        Fn::GetAtt: 
          - HelperRole
          - Arn
      Runtime: python3.6
      Timeout: 900
      VpcConfig:
        SecurityGroupIds:
          - Ref: SecurityGroupId
        SubnetIds:
          - Ref: SubnetId
      Environment:
            Variables:
              ExaoperationUser: !Ref ExaoperationUser
              SSMParameter: !Join [ '', [ 'exasol-timed-shutdown-restart-', !Ref 'AWS::StackName']]
              MgmtNodeIp: !Ref MgmtNodeIp        
      Code:
        ZipFile: |
          import xmlrpc.client
          import ssl
          import base64
          import os
          import time
          import boto3

          exaoperation_password = boto3.client('ssm').get_parameter(Name=os.environ['SSMParameter'])['Parameter']['Value']

          def handler(event, context):
            cleartext_pw = os.environ['ExaoperationUser'] + ':' + exaoperation_password
            credentials = base64.b64encode(cleartext_pw.encode('utf-8')).decode('utf-8')
            cluster_service = xmlrpc.client.ServerProxy('https://' + cleartext_pw + "@" + os.environ['MgmtNodeIp'] + '/cluster1', allow_none=True, context=ssl._create_unverified_context())
            response = cluster_service.callPlugin('Cloud.UIBackend-1.1.4', 'n0010', 'CLOUDUI_REQUEST', '{"method":"start_cluster","credentials":"' + credentials + '"}')
            return response
  BasicParameter:
    Type: "AWS::SSM::Parameter"
    Properties:
      Name: !Join [ '', [ 'exasol-timed-shutdown-restart-', !Ref 'AWS::StackName']]
      Type: "String"
      Value: !Ref ExaoperationPassword
      Description: "SSM Parameter EXAoperation user"
  SSMEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:       
      ServiceName: !Join [ '', [ 'com.amazonaws.', !Ref 'AWS::Region', '.ssm']]  
      VpcId: !Ref VPCId
      SubnetIds:
        - !Ref SubnetId
      SecurityGroupIds:
        - !Ref SecurityGroupId
      VpcEndpointType: Interface
      PrivateDnsEnabled: true

We will post a new article about this template with usage examples soon.

You can find the python code as a part of this template that we are using for stop/start exasol services and data nodes via python xmlrpc.

          import xmlrpc.client
          import ssl
          import base64
          import os
          import time
          import boto3

          exaoperation_password = boto3.client('ssm').get_parameter(Name=os.environ['SSMParameter'])['Parameter']['Value']

          def handler(event, context):
            cleartext_pw = os.environ['ExaoperationUser'] + ':' + exaoperation_password
            credentials = base64.b64encode(cleartext_pw.encode('utf-8')).decode('utf-8')
            cluster_service = xmlrpc.client.ServerProxy('https://' + cleartext_pw + "@" + os.environ['MgmtNodeIp'] + '/cluster1', allow_none=True, context=ssl._create_unverified_context())
            response = cluster_service.callPlugin('Cloud.UIBackend-1.1.4', 'n0010', 'CLOUDUI_REQUEST', '{"method":"stop_cluster","credentials":"' + credentials + '"}')
            return response

Please feel free to copy this code and use it. If you have any questions please don't hesitate to let us know.

Regards,

@exa-Fagani 

View solution in original post

4 REPLIES 4

exa-Chris
Community Manager
Community Manager

Hi Mmzyk,

let me check internally who can help you best with this topic.
Regards

@exa-Chris 

Connecting Customers, Partners, Prospects and Exasolians is my passion. Apart from that I cycle, listen to music, and try to understand what all those technical discussions really mean...

mmzyk
Contributor

Hi Exa-Chris,

Thanks. Any help would be appreciated, we are also open for other approaches that lead to the same result.

Cheers,

Michael

mmzyk
Contributor

Hi @exa-Chris 

Is there any update on this topic? 

Thanks & Regards

Michael

exa-Fagani
Team Exasol
Team Exasol

Hi Michael,

We have a CloudFormation template that can be used to schedule the start/stop of the exasol cluster on AWS via Lambda functions based on CloudWatch events. Please find the CFN template below:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'AWS CloudFormation Template - Shutdown/Restart all cluster nodes via AWS Lambda Function and Cloudwatch Event'
Parameters:
  ExaoperationUser:
    Description: ExaOperation User
    Type: String
    ConstraintDescription: must be valid user
  ExaoperationPassword:
    Description: Password for ExaOperation User
    Type: String
    NoEcho: true
    ConstraintDescription: must be a valid password
  VPCId:
    Type: AWS::EC2::VPC::Id    
  SubnetId:
    Type: AWS::EC2::Subnet::Id
  SecurityGroupId:
    Type: AWS::EC2::SecurityGroup::Id
  ScheduleShutDown:
    Type: String
    Description: "cron(30 09 * * ? *)"
  ScheduleStartUp:
    Type: String
    Description: "cron(45 09 * * ? *)"  
  MgmtNodeIp:
    Type: String
    Description: Private Ip Address or DNS name of the management node
Resources:
  ScheduledRuleShutDown: 
    Type: AWS::Events::Rule
    Properties: 
      Description: Scheduled rule for shutting down the cluster
      ScheduleExpression: 
        Ref: ScheduleShutDown
      State: ENABLED
      Targets: 
        - 
          Arn: 
            Fn::GetAtt: 
              - LambdaFunctionShutDown
              - Arn
          Id: ShutDown
  PermissionForEventsToInvokeLambdaFunctionShutDown: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: LambdaFunctionShutDown
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: 
        Fn::GetAtt: 
          - ScheduledRuleShutDown
          - Arn
  ScheduledRuleStartUp: 
    Type: AWS::Events::Rule
    Properties: 
      Description: Scheduled rule starting up the cluster
      ScheduleExpression: 
        Ref: ScheduleStartUp
      State: ENABLED
      Targets: 
        - 
          Arn: 
            Fn::GetAtt: 
              - LambdaFunctionStartUp
              - Arn
          Id: StartUp
  PermissionForEventsToInvokeLambdaFunctionStartUp: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: LambdaFunctionStartUp
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: 
        Fn::GetAtt: 
          - ScheduledRuleStartUp
          - Arn
  HelperRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: /
      ManagedPolicyArns:
      - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
      - PolicyName: LambdaPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - "ec2:CreateNetworkInterface"
            - "ec2:DescribeNetworkInterfaces"
            - "ec2:DeleteNetworkInterface"
            Resource: "*"
          - Effect: Allow
            Action:
            - "ec2:StopInstances"
            - "ec2:StartInstances"
            Resource: 
              -  !Join [ '', [ 'arn:aws:ec2:', !Ref 'AWS::Region', ':', !Ref 'AWS::AccountId', ':instance/*'] ]
          - Effect: Allow
            Action:
            - "ssm:GetParameter"
            Resource: 
            - !Join [ '', [ 'arn:aws:ssm:', !Ref 'AWS::Region', ':',!Ref 'AWS::AccountId', ':parameter/', !Join [ '', [ 'exasol-timed-shutdown-restart-', !Ref 'AWS::StackName']] ]]             
  LambdaFunctionShutDown: 
    Type: AWS::Lambda::Function
    Properties: 
      Handler: index.handler
      Role: 
        Fn::GetAtt: 
          - HelperRole
          - Arn
      Runtime: python3.6
      Timeout: 900
      VpcConfig:
        SecurityGroupIds:
          - Ref: SecurityGroupId
        SubnetIds:
          - Ref: SubnetId
      Environment:
            Variables:
              ExaoperationUser: !Ref ExaoperationUser
              SSMParameter: !Join [ '', [ 'exasol-timed-shutdown-restart-', !Ref 'AWS::StackName']]
              MgmtNodeIp: !Ref MgmtNodeIp
      Code:
        ZipFile: |
          import xmlrpc.client
          import ssl
          import base64
          import os
          import time
          import boto3

          exaoperation_password = boto3.client('ssm').get_parameter(Name=os.environ['SSMParameter'])['Parameter']['Value']

          def handler(event, context):
            cleartext_pw = os.environ['ExaoperationUser'] + ':' + exaoperation_password
            credentials = base64.b64encode(cleartext_pw.encode('utf-8')).decode('utf-8')
            cluster_service = xmlrpc.client.ServerProxy('https://' + cleartext_pw + "@" + os.environ['MgmtNodeIp'] + '/cluster1', allow_none=True, context=ssl._create_unverified_context())
            response = cluster_service.callPlugin('Cloud.UIBackend-1.1.4', 'n0010', 'CLOUDUI_REQUEST', '{"method":"stop_cluster","credentials":"' + credentials + '"}')
            return response
  LambdaFunctionStartUp: 
    Type: AWS::Lambda::Function
    Properties: 
      Handler: index.handler
      Role: 
        Fn::GetAtt: 
          - HelperRole
          - Arn
      Runtime: python3.6
      Timeout: 900
      VpcConfig:
        SecurityGroupIds:
          - Ref: SecurityGroupId
        SubnetIds:
          - Ref: SubnetId
      Environment:
            Variables:
              ExaoperationUser: !Ref ExaoperationUser
              SSMParameter: !Join [ '', [ 'exasol-timed-shutdown-restart-', !Ref 'AWS::StackName']]
              MgmtNodeIp: !Ref MgmtNodeIp        
      Code:
        ZipFile: |
          import xmlrpc.client
          import ssl
          import base64
          import os
          import time
          import boto3

          exaoperation_password = boto3.client('ssm').get_parameter(Name=os.environ['SSMParameter'])['Parameter']['Value']

          def handler(event, context):
            cleartext_pw = os.environ['ExaoperationUser'] + ':' + exaoperation_password
            credentials = base64.b64encode(cleartext_pw.encode('utf-8')).decode('utf-8')
            cluster_service = xmlrpc.client.ServerProxy('https://' + cleartext_pw + "@" + os.environ['MgmtNodeIp'] + '/cluster1', allow_none=True, context=ssl._create_unverified_context())
            response = cluster_service.callPlugin('Cloud.UIBackend-1.1.4', 'n0010', 'CLOUDUI_REQUEST', '{"method":"start_cluster","credentials":"' + credentials + '"}')
            return response
  BasicParameter:
    Type: "AWS::SSM::Parameter"
    Properties:
      Name: !Join [ '', [ 'exasol-timed-shutdown-restart-', !Ref 'AWS::StackName']]
      Type: "String"
      Value: !Ref ExaoperationPassword
      Description: "SSM Parameter EXAoperation user"
  SSMEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:       
      ServiceName: !Join [ '', [ 'com.amazonaws.', !Ref 'AWS::Region', '.ssm']]  
      VpcId: !Ref VPCId
      SubnetIds:
        - !Ref SubnetId
      SecurityGroupIds:
        - !Ref SecurityGroupId
      VpcEndpointType: Interface
      PrivateDnsEnabled: true

We will post a new article about this template with usage examples soon.

You can find the python code as a part of this template that we are using for stop/start exasol services and data nodes via python xmlrpc.

          import xmlrpc.client
          import ssl
          import base64
          import os
          import time
          import boto3

          exaoperation_password = boto3.client('ssm').get_parameter(Name=os.environ['SSMParameter'])['Parameter']['Value']

          def handler(event, context):
            cleartext_pw = os.environ['ExaoperationUser'] + ':' + exaoperation_password
            credentials = base64.b64encode(cleartext_pw.encode('utf-8')).decode('utf-8')
            cluster_service = xmlrpc.client.ServerProxy('https://' + cleartext_pw + "@" + os.environ['MgmtNodeIp'] + '/cluster1', allow_none=True, context=ssl._create_unverified_context())
            response = cluster_service.callPlugin('Cloud.UIBackend-1.1.4', 'n0010', 'CLOUDUI_REQUEST', '{"method":"stop_cluster","credentials":"' + credentials + '"}')
            return response

Please feel free to copy this code and use it. If you have any questions please don't hesitate to let us know.

Regards,

@exa-Fagani 

View solution in original post