This section explains how to deploy Cloud Native Qumulo (CNQ) on AWS by creating the persistent storage and the cluster compute and cache resources with Terraform. It also provides recommendations for Terraform deployments and information about post-deployment actions and optimization.

For an overview of CNQ on AWS, its prerequisites, and limits, see How Cloud Native Qumulo Works.

The aws-terraform-cnq-<x.y>.zip file (the version in the file name corresponds to the provisioning scripts, not the version of Qumulo Core) contains comprehensive Terraform configurations that let you deploy S3 buckets and then create a CNQ cluster with 4 to 24 instances that adhere to the AWS Well-Architected Framework and have fully elastic compute and capacity.

Prerequisites

This section explains the prerequisites to deploying CNQ on AWS.

  • To allow your Qumulo instance to report metrics to Qumulo, your AWS VPC must have outbound Internet connectivity through a NAT gateway or a firewall. Your instance shares no file data during this process.

  • The following features require specific versions of Qumulo Core:

    Feature Minimum Qumulo Core Version
    • Adding S3 buckets to increase persistent storage capacity
    • Increasing the soft capacity limit for an existing CNQ cluster
    7.2.1.1
    7.2.0.2
    Creating persistent storage 7.1.3 with version 4.0 of the deployment scripts
  • Before you configure your Terraform environment, you must sign in to the AWS CLI.

    A custom IAM role or user must include the following AWS services:

    • cloudformation:*
    • ec2:*
    • elasticloadbalancing:*
    • iam:*
    • kms:*
    • lambda:*
    • logs:*
    • resource-groups:*
    • route53:*
    • s3:*
    • secretsmanager:*
    • sns:*
    • ssm:*
    • sts:*

How the CNQ Provisioner Works

The CNQ Provisioner is an m5.large EC2 instance that configures your Qumulo cluster and any additional AWS environment requirements.

The Provisioner stores all necessary state information in the Parameter Store and shuts down automatically when it completes its tasks. (In AWS Systems Manager, click Application Management > Parameter Store > /qumulo/<my-stack-name>/last-run-status. On the History tab, click ⚙️, and then in the Preferences dialog box, click Parameter history properties > Value > Confirm.)

Step 1: Deploying Cluster Persistent Storage

This section explains how to deploy the S3 buckets that act as persistent storage for your Qumulo cluster.

  1. Log in to Nexus and click Downloads > Cloud Native Qumulo Downloads.

  2. On the AWS tab and, in the Download the required files section, select the Qumulo Core version that you want to deploy and then download the corresponding Terraform configuration, Debian package, and host configuration file.

  3. Create a new S3 bucket and, within your S3 bucket prefix, create the qumulo-core-install directory.

  4. Within this directory, create another directory with the Qumulo Core version as its name. For example:

    my-s3-bucket-name/my-s3-bucket-prefix/qumulo-core-install/7.2.3.2
    
  5. Copy qumulo-core.deb and host_configuration.tar.gz into the directory named after the Qumulo Core version (in this example, it is 7.2.3.2).

  6. Copy aws-terraform-cnq-<x.y>.zip to your Terraform environment and decompress.

  7. Navigate to the persistent-storage directory and take the following steps:

    1. Run the terraform init command.

      Terraform prepares the environment and displays the message Terraform has been successfully initialized!

    2. Review the terraform.tfvars file.

      • Specify the deployment_name and the correct aws_region for your cluster’s persistent storage.

      • Leave the soft_capacity_limit at 500.

    3. Use the aws CLI to authenticate to your AWS account.

    4. Run the terraform apply command.

      Terraform displays the execution plan.

    5. Review the Terraform execution plan and then enter yes.

      Terraform creates resources according the execution plan and displays:

      • The Apply complete! message with a count of added resources

      • The names of the created S3 buckets

      • Your deployment’s unique name

      For example:

      Apply complete! Resources: 15 added, 0 changed, 0 destroyed.
            
      Outputs:
      
      bucket_names = [
        "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
        "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
        "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
        "ab2cdefghij-my-deployment-klmnopqr6st-qps-4",
      ]
      deployment_unique_name = "my-deployment-ABCDEFGH1IJ"
      ...
      

Step 2: Deploying Cluster Compute and Cache Resources

This section explains how to deploy compute and cache resources for a Qumulo cluster by using a Ubuntu AMI and the Qumulo Core .deb installer.

  1. Configure your VPC to use the gateway VPC endpoint for S3.

  2. Navigate to the aws-terraform-cnq-<x.y> directory and then run the terraform init command.

    Terraform prepares the environment and displays the message Terraform has been successfully initialized!

  3. Choose config-standard.tfvars or config-advanced.tfvars and fill in the values for all variables.

    For more information, see README.pdf in aws-terraform-cnq-<x.y>.zip.

  4. Run the terraform apply -var-file config-standard.tfvars command.

    Terraform displays the execution plan.

  5. Review the Terraform execution plan and then enter yes.

    Terraform creates resources according the execution plan and displays:

    • The Apply complete! message with a count of added resources

    • Your deployment’s unique name

    • The names of the created S3 buckets

    • The floating IP addresses for your Qumulo cluster

    • The primary (static) IP addresses for your Qumulo cluster

    • The Qumulo Core Web UI endpoint

    For example:

    Apply complete! Resources: 62 added, 0 changed, 0 destroyed.
      
    Outputs:
      
    cluster_provisioned = "Success"
    deployment_unique_name = "my-deployment-ABCDEFGH1IJ"
    ...
    persistent_storage_bucket_names = tolist([
      "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
      "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
      "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
      "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
    ])
    qumulo_floating_ips = [
      "203.0.113.42",
      "203.0.113.84",
      ...
    ]
    ...
    qumulo_primary_ips = [
      "203.0.113.0",
      "203.0.113.1",
      "203.0.113.2",
      "203.0.113.3"
    ]
    ...
    qumulo_private_url_node1 = "https://203.0.113.10"
    
  6. To log in to your cluster’s Web UI, use the endpoint from the Terraform output and the username and password that you have configured.

    You can use the Web UI to create and manage NFS exports, SMB shares, snapshots, and continuous replication relationships You can also join your cluster to Active Directory, configure LDAP, and perform many other operations.

  7. Mount your Qumulo file system by using NFS or SMB and your cluster’s DNS name or IP address.

Step 3: Performing Post-Deployment Actions

This section describes the common actions you can perform on a CNQ cluster after deploying it.

Adding a Node to an Existing Cluster

  1. Edit config-standard.tfvars or config-advanced.tfvars and change the value of q_node_count to a new value.
  2. Run the terraform apply -var-file config-standard.tfvars command.
  3. Terraform displays the execution plan.

    Review the Terraform execution plan and then enter yes.

    Terraform changes resources according the execution plan and displays an additional primary (static) IP for the new node. For example:

    qumulo_primary_ips = [
      "203.0.113.0",
      "203.0.113.1",
      "203.0.113.2",
      "203.0.113.3",
      "203.0.113.4"   
    ]
    
  4. To ensure that the Provisioner shut downs automatically, monitor the /qumulo/my-deployment-name/last-run-status parameter for the Provisioner. (In AWS Systems Manager, click Application Management > Parameter Store > /qumulo/<my-stack-name>/last-run-status. On the History tab, click ⚙️, and then in the Preferences dialog box, click Parameter history properties > Value > Confirm.)
  5. To check that the cluster is healthy, log in to the Web UI.

Removing a Node from an Existing Cluster

Removing a node from an existing cluster is a two-step process:

  1. Remove the node from your cluster’s quorum.
  2. Tidy up the AWS resources for the removed nodes.

Step 1: Remove the Node from the Cluster’s Quorum

You must perform this step while the cluster is running.

  1. Copy the remove-nodes.sh script from the aws-terraform-cnq-<x.y>/utilities directory to a machine running in your VPC that has the AWS CLI tools installed (for example, an Amazon Linux 2 AMI).

  2. Run the remove-nodes.sh script and specify the AWS region, the unique deployment name, the current node count, and the final node count.

    In the following example, we reduce a cluster from 6 to 4 nodes.

    ./remove-nodes.sh \
      --region us-west-2 \
      --qstackname my-unique-deployment-name \
      --currentnodecount 6 \
      --finalnodecount 4
    
  3. Review the nodes to be removed and then enter y.

  4. Enter the administrator password for your cluster.

    The script removes the nodes and displays:

    • Confirmation that your cluster formed a new quorum

    • Confirmation that the new quorum is active

    • The new total number of nodes in the quorum

    • The EC2 identifiers for the removed nodes

    • The endpoint for your cluster’s Web UI

    {"monitor_uri": "/v1/node/state"}
         --Waiting for new quorum
         --New quorum formed
         --Quorum is ACTIVE
         --Validating quorum
         --4 Nodes in Quorum
         --REMOVED: EC2 ID=i-0ab12345678c9012d >> Qumulo node_id=5
         --REMOVED: EC2 ID=i-9dc87654321b0987a >> Qumulo node_id=6
    **Verify the cluster is healthy in the Qumulo UI at https://203.0.113.10
    ...
    
  5. To check that the cluster is healthy, log in to the Web UI.

Step 2: Tidy Up AWS Resources for Removed Nodes

To avoid incurring additional costs, we recommend tidying up the AWS resources for the removed nodes.

  1. Navigate to the aws-terraform-cnq-<x.y> directory.
  2. Edit config-standard.tfvars or config-advanced.tfvars and change the value of q_node_count to a lower value (for example, 4).
  3. Run the terraform apply -var-file config-standard.tfvars command.

    Review the Terraform execution plan and then enter yes.

    Terraform removes the resources for the removed nodes according the execution plan and displays the primary (static) IPs for the remaining nodes. For example:

    qumulo_primary_ips = [
      "203.0.113.0",
      "203.0.113.1",
      "203.0.113.2",
      "203.0.113.3",
      "203.0.113.4"   
    ]
    

    The node and the infrastructure associated with the node are removed.

  4. To check that the cluster is healthy, log in to the Web UI.

Increasing the Soft Capacity Limit for an Existing Cluster

Increasing the soft capacity limit for an existing cluster is a two-step process:

  1. Configure new persistent storage parameters.
  2. Configure new compute and cache deployment parameters.

Step 1: Set New Persistent Storage Parameters

  1. Edit the terraform.tfvars file in the persistent-storage directory and set the soft_capacity_limit variable to a higher value.
  2. Run the terraform apply command.

    Review the Terraform execution plan and then enter yes.

    Terraform creates new S3 buckets as necessary and displays:

    • The Apply complete! message with a count of changed resources

    • The names of the created S3 buckets

    • Your deployment’s unique name

    • The new soft capacity limit

    Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
    
    Outputs:
    
    bucket_names = [
      "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
      "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
      "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
      "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
    ]
    deployment_unique_name = "lucia-deployment-GKMVD58UF2F"
    ...
    soft_capacity_limit = "1000 TB"
    

Step 2: Update Existing Compute and Cache Resource Deployment

  1. Navigate to the root directory of the aws-terraform-cnq-<x.y> repository.
  2. Run the terraform apply -var-file config-standard.tfvars command.

    Review the Terraform execution plan and then enter yes.

    Terraform updates the necessary IAM roles and S3 bucket policies, adds S3 buckets to the persistent storage list for the cluster, increases the soft capacity limit, and displays the Apply complete! message.

    When the Provisioner shuts down automatically, this process is complete.

Scaling Your Existing CNQ on AWS Cluster

You can scale an existing CNQ on AWS cluster by changing the EC2 instance type. This is a three-step process:

  1. Create a new deployment in a new Terraform workspace and join the new EC2 instances to a quorum.
  2. Remove the existing EC2 instances.
  3. Clean up your S3 bucket policies.

Step 1: Create a New Deployment in a New Terraform Workspace

  1. To create a new Terraform workspace, run the terraform workspace new my-new-workspace-name command.
  2. To initialize the workspace, run the terraform init command.
  3. Edit config-standard.tfvars or config-advanced.tfvars:

    1. Specify the value for the q_instance_type variable.
    2. Set the value of the q_replacement_cluster variable to true.
    3. Set the value of the q_existing_deployment_unique_name variable to the current deployment’s name.
    4. (Optional) To change the number of nodes, specify the value for the q_node_count variable.
  4. Run the terraform apply -var-file config-standard.tfvars command.

    Review the Terraform execution plan and then enter yes.

    Terraform creates resources according the execution plan and displays:

    • The Apply complete! message with a count of added resources

    • Your deployment’s unique name

    • The names of the created S3 buckets

    • The same floating IP addresses for your Qumulo cluster

    • New primary (static) IP addresses for your Qumulo cluster

    • The Qumulo Core Web UI endpoint

    Apply complete! Resources: 66 added, 0 changed, 0 destroyed.
    
    Outputs:
    
    cluster_provisioned = "Success"
    deployment_unique_name = "my-deployment-ABCDEFGH1IJ"
    ...
    persistent_storage_bucket_names = tolist([
      "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
      "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
      "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
      "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
    ])
    qumulo_floating_ips = [
      "203.0.113.42",
      "203.0.113.84",
      ...
    ]
    ...
    qumulo_primary_ips = [
      "203.0.113.4",
      "203.0.113.5",
      "203.0.113.6",
      "203.0.113.7"
    ]
    ...
    qumulo_private_url_node1 = "https://203.0.113.10"
    
  5. To ensure that the Provisioner shut downs automatically, monitor the /qumulo/my-deployment-name/last-run-status parameter for the Provisioner. (In AWS Systems Manager, click Application Management > Parameter Store > /qumulo/<my-stack-name>/last-run-status. On the History tab, click ⚙️, and then in the Preferences dialog box, click Parameter history properties > Value > Confirm.)
  6. To check that the cluster is healthy, log in to the Web UI.

Step 2: Remove the Existing EC2 Instances

  1. To select the previous Terraform workspace (for example, default), run the terraform workspace select default command.
  2. To ensure that the correct workspace is selected, run the terraform workspace show command.
  3. Run the terraform destroy -var-file config-standard.tfvars command.

    Review the Terraform execution plan and then enter yes.

    Terraform deletes resources according to the execution plan and displays the Destroy complete! message with a count of destroyed resources.

    The previous EC2 instances are deleted.

Step 3: Clean Up S3 Bucket Policies

  1. To list your Terraform workspaces, run the terraform workspace list command.
  2. To select your new Terraform workspace, run the terraform workspace select <my-new-workspace-name> command.
  3. Edit the config-standard.tfvars or config-advanced.tfvars file and set the q_replacement_cluster variable to false.
  4. Run the terraform apply -var-file config-standard.tfvars command. This ensures that the S3 bucket policies have least privilege.

    Review the Terraform execution plan and then enter yes.

    Terraform deletes resources according to the execution plan and displays the Apply complete! message with a count of destroyed resources.

Deleting an Existing Cluster

Deleting a cluster is a two-step process:

  1. Delete your Cloud Native Qumulo resources.
  2. Delete your persistent storage.

Step 1: To Delete Your Cluster’s Cloud Native Qumulo Resources

  1. After you back up your data safely, edit your config-standard.tfvars or config-advanced.tfvars file and set the term_protection variable to false.
  2. Run the terraform apply -var-file config-standard.tfvars command.

    Review the Terraform execution plan and then enter yes.

    Terraform marks resources for deletion according to the execution plan and displays the Apply complete! message with a count of changed resources.

  3. Run the terraform destroy -var-file config-standard.tfvars command.

    Review the Terraform execution plan and then enter yes.

    Terraform deletes all of your cluster’s CNQ resources and displays the Destroy complete! message and a count of destroyed resources.

Step 2: To Delete Your Cluster’s Persistent Storage

  1. Navigate to the persistent-storage directory.
  2. Edit your terraform.tfvars file and set the prevent_destroy parameter to false.
  3. Run the terraform apply command.

    Review the Terraform execution plan and then enter yes.

    Terraform marks resources for deletion according to the execution plan and displays the Apply complete! message with a count of changed resources.

  4. Run the terraform destroy command.

    Review the Terraform execution plan and then enter yes.

    Terraform deletes all of your cluster’s persistent storage and displays the Destroy complete! message and a count of destroyed resources.