Deploying Cloud Native Qumulo on AWS with Terraform

This section explains how to deploy Cloud Native Qumulo (CNQ) on AWS by creating the persistent storage and the cluster compute and cache resources with Terraform. It also provides recommendations for Terraform deployments and information about post-deployment actions and optimization.

For an overview of CNQ on AWS, its prerequisites, and limits, see How Cloud Native Qumulo Works.

The aws-terraform-cnq-<x.y>.zip file (the version in the file name corresponds to the provisioning scripts, not the version of Qumulo Core) contains comprehensive Terraform configurations that let you deploy S3 buckets and then create a CNQ cluster with 1 or 3–24 instances that adhere to the AWS Well-Architected Framework and have fully elastic compute and capacity.

Prerequisites

This section explains the prerequisites to deploying CNQ on AWS.

To allow your Qumulo cluster to report metrics to Qumulo, your AWS VPC must have outbound Internet connectivity through a NAT gateway or a firewall. Your instance shares no file data during this process.
Important
Connectivity to the following endpoints is required for a successful deployment of a Qumulo instance and quorum formation:
- api.missionq.qumulo.com
- api.nexus.qumulo.com
To deploy your Qumulo cluster with a VPC S3 gateway, you must configure your VPC to use the S3 gateway VPC endpoint.

The following features require specific versions of Qumulo Core:

Feature	Minimum Qumulo Core Version
Adding S3 buckets to increase persistent storage capacity Increasing the soft capacity limit for an existing CNQ cluster	7.2.1.1
S3 Intelligent-Tiering storage class Infrequent Access S3 access tier	7.2.0.3
Creating persistent storage Important You must create persistent storage by using a separate Terraform deployment before you deploy the compute and cache resources for your cluster.	7.1.3 with version 4.0 of the deployment scripts

Before you configure your Terraform environment, you must sign in to the AWS CLI.
Important
- Unless you use the AdministratorAccess managed IAM policy for your user or role, you can run the iam_tester.py script in the utilities directory to validate your IAM role.
- For an explicit list of privileges recommended for least-privilege access, see the IAM documentation in the utilities directory.
A custom IAM role or user must include the following AWS services:
- cloudformation:*
- ec2:*
- elasticloadbalancing:*
- iam:*
- kms:*
- lambda:*
- logs:*
- resource-groups:*
- route53:*
- route53resolver:*
- s3:*
- secretsmanager:*
- sns:*
- ssm:*
- sts:*
Note
Although the AdministratorAccess managed IAM policy provides sufficient permissions, your organization might use a custom policy with more restrictions.

How the CNQ Provisioner Works

The CNQ Provisioner is an m5.large EC2 instance that configures your Qumulo cluster and any additional AWS environment requirements.

To Monitor the Provisioner’s Status

The Provisioner stores all necessary state information in the Parameter Store and shuts down automatically when it completes its tasks.

In AWS Systems Manager, click Application Management > Parameter Store > /qumulo/<my-unique-deployment-name>/last-run-status.
On the History tab, click ⚙️.
In the Preferences dialog box, click Parameter history properties > Value > Confirm.

Step 1: Deploying Cluster Persistent Storage

This section explains how to deploy the S3 buckets that act as persistent storage for your Qumulo cluster.

Part 1: Prepare the Required Files

Before you can deploy the persistent storage for your cluster, you must download and prepare the required files.

Log in to Qumulo Nexus and click Downloads > Cloud Native Qumulo Downloads.
On the AWS tab, in the Download the required files section, select the Qumulo Core version that you want to deploy and then download the corresponding Terraform configuration and Debian or RPM package.
In a new or existing S3 bucket, within your S3 bucket prefix, create the qumulo-core-install directory.
Within this directory, create another directory with the Qumulo Core version as its name. For example:
```
my-s3-bucket-name/my-s3-bucket-prefix/qumulo-core-install/7.5.0
```
Tip
Make a new subdirectory for every new release of Qumulo Core.
Copy qumulo-core.deb or qumulo-core.rpm into the directory named after the Qumulo Core version (in this example, it is 7.5.0).
Copy aws-terraform-cnq-<x.y>.zip to your Terraform environment and then decompress the file.

Part 2: Create the Necessary Resources

Navigate to the persistent-storage directory.
Edit the provider.tf file:
- To store the Terraform state remotely, add the name of an S3 bucket to the sections that begin with backend "s3" {.
- To store the Terraform state locally, comment the sections that begin with backend "s3" {.
  
  Important
  We don’t recommend storing the Terraform state locally for production deployments.
Run the terraform init command.

Terraform prepares the environment and displays the message Terraform has been successfully initialized!
Edit the terraform.tfvars file.
- Specify the deployment_name and the correct aws_region for your cluster’s persistent storage.
- Set the soft_capacity_limit to 500 (or higher).
  
  Note
  This value specifies the initial capacity limit of your Qumulo clusters (in TB). It is possible to increase this limit at any time.

Use the aws CLI to authenticate to your AWS account.

Run the terraform apply command.

Review the Terraform execution plan and then enter yes.

Terraform displays:

The Apply complete! message with a count of added resources
The names of the created S3 buckets
Your deployment’s unique name

For example:

Apply complete! Resources: 15 added, 0 changed, 0 destroyed.
      
Outputs:

persistent_storage_bucket_names = tolist([
  "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
  "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
  "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
  ...
  "ab2cdefghij-my-deployment-klmnopqr6st-qps-16"
])
deployment_unique_name = "myname-deployment-ABCDE01EG2H"
...

Step 2: Deploying Cluster Compute and Cache Resources

This section explains how to deploy compute and cache resources for a Qumulo cluster by using a Ubuntu AMI and the Qumulo Core .deb installer.

Recommendations

Important
We strongly recommend reviewing the following recommendations before beginning this process.

Provisioning completes successfully when the Provisioner shuts down automatically. If the Provisioner doesn’t shut down, the provisioning cycle has failed and you must troubleshoot it. To monitor the Provisioner’s status, you can watch the Terraform operations in your terminal or monitor the Provisioner in AWS Systems Manager.
The first variable in the example configuration files in the aws-terraform-cnq repository is deployment_name. To help avoid conflicts between Network Load Balancers (NLBs), resource groups, cross-region CloudWatch views, and other deployment components, Terraform ignores the deployment_name value and any changes to it. Terraform generates the additional deployment_unique_name variable; appends a random, 11-digit alphanumeric value to it; and then tags all future resources with this variable, which never changes during subsequent Terraform deployments.
If you plan to deploy multiple Qumulo clusters, give the q_cluster_name variable a unique name for each cluster.
We recommend forwarding DNS queries to Qumulo Authoritative DNS (QDNS). For a single-AZ deployment, to allow Qumulo Core to create an Amazon Route 53 outbound resolver, specify values for the q_cluster_fqdn and second_private_subnet_id variables. The resolver uses the q_cluster_fqdn variable to forward DNS requests to your cluster, where Qumulo Core resolves DNS for your floating IP addresses.

To Deploy the Cluster Compute and Cache Resources

Configure your VPC to use the gateway VPC endpoint for S3.
Edit the provider.tf file:
- To store the Terraform state remotely, add the name of an S3 bucket to the sections that begin with backend "s3" { and data "terraform_remote_state" "persistent_storage" {.
- To store the Terraform state locally, comment the sections that begin with backend "s3" { and data "terraform_remote_state" "persistent_storage" { and uncomment the section that contains backend = "local".
  
  Important
  We don’t recommend storing the Terraform state locally for production deployments.
Navigate to the aws-terraform-cnq-<x.y> directory and then run the terraform init command.

Terraform prepares the environment and displays the message Terraform has been successfully initialized!
Edit the terraform.tfvars file, specifying the values for all variables.

For more information, see README.pdf in aws-terraform-cnq-<x.y>.zip.
Run the terraform apply command.

Review the Terraform execution plan and then enter yes.

Terraform displays:

The Apply complete! message with a count of added resources
Your deployment’s unique name
The names of the created S3 buckets
The floating IP addresses for your Qumulo cluster
The primary (static) IP addresses for your Qumulo cluster
The Qumulo Core Web UI endpoint

For example:

Apply complete! Resources: 62 added, 0 changed, 0 destroyed.
  
Outputs:
  
cluster_provisioned = "Success"
deployment_unique_name = "myname-deployment-ABCDE01EG2H"
...
persistent_storage_bucket_names = tolist([
  "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
  "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
  "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
  ...
  "ab2cdefghij-my-deployment-klmnopqr6st-qps-16",
])
qumulo_floating_ips = tolist([
  "203.0.113.42",
  "203.0.113.84",
  ...
])
...
qumulo_primary_ips = tolist([
  "203.0.113.5",
  "203.0.113.6",
  "203.0.113.7"
])
...
qumulo_private_url_node1 = "https://203.0.113.5"

To log in to your cluster’s Web UI, use the endpoint from the Terraform output and the username and password that you have configured.

Important
If you change the administrative password for your cluster by using the Qumulo Core Web UI, qq CLI, or REST API after deployment, you must update your password in AWS Secrets Manager.

You can use the Qumulo Core Web UI to create and manage NFS exports, SMB shares, snapshots, and continuous replication relationships You can also join your cluster to Active Directory, configure LDAP, and perform many other operations.
Mount your Qumulo file system by using NFS or SMB and your cluster’s DNS name or IP address.

Step 3: Performing Post-Deployment Actions

This section describes the common actions you can perform on a CNQ cluster after deploying it.

Adding Nodes to an Existing Cluster

Important
To add nodes to an existing cluster, the total node count must be greater than that of the current deployment.

Edit terraform.tfvars and change the value of q_node_count to a new value.
Run the terraform apply command.
Review the Terraform execution plan and then enter yes.

Terraform displays an additional primary (static) IP for the new node. For example:
```
qumulo_primary_ips = tolist([
  "203.0.113.5",
  "203.0.113.6",
  "203.0.113.7",
  "203.0.113.8",
  "203.0.113.9"   
])
```
To ensure that the Provisioner shut downs automatically, monitor the /qumulo/my-deployment-name/last-run-status parameter for the Provisioner. To monitor the Provisioner’s status, you can watch the Terraform operations in your terminal or monitor the Provisioner in AWS Systems Manager.
To check that the cluster is healthy and has the needed number of nodes, log in to the Qumulo Core Web UI.

Removing Nodes from an Existing Cluster

Removing nodes from an existing cluster is a two-step process:

Remove the nodes from your cluster’s quorum.
Tidy up the AWS resources for the removed nodes.

Step 1: Remove Nodes from the Cluster’s Quorum

You must perform this step while the cluster is running.

Edit the terraform.tfvars file, setting the value of q_target_node_count to a reduced number of nodes in the cluster.
Run the terraform apply command.

Review the nodes to be removed and then enter yes.

Terraform removes the nodes and displays:

The Apply complete! message with a count of removed resources
Your deployment’s unique name
The remaining S3 buckets for your Qumulo cluster
The primary (static) IP addresses for the node removed from your Qumulo cluster
The Qumulo Core Web UI endpoint

For example:

Apply complete! Resources: 0 added, 0 changed, 1 destroyed.

Outputs:

cluster_provisioned = "Success"
deployment_unique_name = "myname-deployment-ABCDE01EG2H"
...
persistent_storage_bucket_names = tolist([
  "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
  "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
  "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
  ...
  "ab2cdefghij-my-deployment-klmnopqr6st-qps-16"
])
qumulo_floating_ips = tolist([
  "203.0.113.42",
  "203.0.113.84",
  ...
])
...
qumulo_primary_ips_removed_nodes = "203.0.113.24",
...
qumulo_private_url_node1 = "https://203.0.113.10"

Step 2: Tidy Up AWS Resources for Removed Nodes

Edit the terraform.tfvars file:
1. Set the value of the q_node_count variable to a reduced number of nodes in the cluster.
2. Set the value of the q_target_node_count to null.
Run the terraform apply command.
Review the resources to be removed and then enter yes.

To check that the cluster is healthy and has the needed number of nodes, log in to the Qumulo Core Web UI.

Terraform tidies up the resources for removed nodes and displays:

The Apply complete! message with a count of removed resources
Your deployment’s unique name
The remaining S3 buckets for your Qumulo cluster
The remaining floating IP addresses for your Qumulo cluster
The remaining primary (static) IP addresses for your Qumulo cluster
The Qumulo Core Web UI endpoint

For example:

Apply complete! Resources: 0 added, 0 changed, 66 destroyed.

Outputs:

cluster_provisioned = "Success"
deployment_unique_name = "myname-deployment-ABCDE01EG2H"
...
persistent_storage_bucket_names = tolist([
  "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
  "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
  "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
  ...
  "ab2cdefghij-my-deployment-klmnopqr6st-qps-16"
])
qumulo_floating_ips = tolist([
  "203.0.113.42",
  "203.0.113.84",
  ...
])
...
qumulo_primary_ips = tolist([
  "203.0.113.4",
  "203.0.113.5",
  "203.0.113.6",
  "203.0.113.7"
])
...
qumulo_private_url_node1 = "https://203.0.113.10"

Increasing the Soft Capacity Limit for an Existing Cluster

Increasing the soft capacity limit for an existing cluster is a two-step process:

Configure new persistent storage parameters.
Configure new compute and cache deployment parameters.

Step 1: Set New Persistent Storage Parameters

Edit the terraform.tfvars file in the persistent-storage directory and set the soft_capacity_limit variable to a higher value.

Run the terraform apply command.

Review the Terraform execution plan and then enter yes.

Terraform creates new S3 buckets as necessary and displays:

The Apply complete! message with a count of changed resources
The names of the created S3 buckets
Your deployment’s unique name
The new soft capacity limit

For example:

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

Outputs:

persistent_storage_bucket_names = tolist([
  "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
  "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
  "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
  ...
  "ab2cdefghij-my-deployment-klmnopqr6st-qps--16"
])
deployment_unique_name = "myname-deployment-ABCDE01EG2H"
...
soft_capacity_limit = "1000 TB"

Step 2: Update Existing Compute and Cache Resource Deployment

Navigate to the root directory of the aws-terraform-cnq-<x.y> repository.
Run the terraform apply command.

Review the Terraform execution plan and then enter yes.

Terraform updates the necessary IAM roles and S3 bucket policies, adds S3 buckets to the persistent storage list for the cluster, increases the soft capacity limit, and displays the Apply complete! message.

When the Provisioner shuts down automatically, this process is complete.

Changing the EC2 Instance Type of Your CNQ on AWS Cluster

You can change the EC2 instance type, node count, and to convert your cluster from single-AZ to multi-AZ, or the other way around.

Important

To minimize potential availability interruptions, you must perform the cluster replacement procedure as a two-quorum event. For example, if you stop the existing EC2 instances by using the AWS Management Console and change the EC2 instance types, two quorum events occur for each node and the read and write cache isn't optimized for the EC2 instance type.
Performing the cluster replacement procedure ensures that the required EC2 instance types are available in advance.

Changing the EC2 instance type of your CNQ on AWS cluster is a three-step process:

Create a new deployment in a new Terraform workspace and join the new EC2 instances to a quorum.
Remove the existing EC2 instances.
Clean up your S3 bucket policies.

Step 1: Create a New Deployment in a New Terraform Workspace

To create a new Terraform workspace, run the terraform workspace new my-new-workspace-name command.
Edit the terraform.tfvars file:
1. Specify the value for the private_subnet_id variable.
  
  Note
  For multi-AZ deployments, specify values as a comma-delimited list.
2. Specify the value for the q_instance_type variable.
3. Set the value of the q_replacement_cluster variable to true.
4. Set the value of the q_existing_deployment_unique_name variable to the current deployment’s name.
5. (Optional) To change the number of nodes, specify the value for the q_node_count variable.
Important
Leave the other variables unchanged.

Run the terraform apply command.

Review the Terraform execution plan and then enter yes.

Terraform displays:

The Apply complete! message with a count of added resources
Your deployment’s unique name
The names of the created S3 buckets
The same floating IP addresses for your Qumulo cluster
New primary (static) IP addresses for your Qumulo cluster
The Qumulo Core Web UI endpoint

For example:

Apply complete! Resources: 66 added, 0 changed, 0 destroyed.

Outputs:

cluster_provisioned = "Success"
deployment_unique_name = "myname-deployment-ABCDE01EG2H"
...
persistent_storage_bucket_names = tolist([
  "ab5cdefghij-my-deployment-klmnopqr9st-qps-1",
  "ab4cdefghij-my-deployment-klmnopqr8st-qps-2",
  "ab3cdefghij-my-deployment-klmnopqr7st-qps-3",
  ...
  "ab2cdefghij-my-deployment-klmnopqr6st-qps--16"
])
qumulo_floating_ips = tolist([
  "203.0.113.42",
  "203.0.113.84",
  ...
])
...
qumulo_primary_ips = tolist([
  "203.0.113.4",
  "203.0.113.5",
  "203.0.113.6",
  "203.0.113.7"
])
...
qumulo_private_url_node1 = "https://203.0.113.10"

To ensure that the Provisioner shut downs automatically, monitor the /qumulo/my-deployment-name/last-run-status parameter for the Provisioner. To monitor the Provisioner’s status, you can watch the Terraform operations in your terminal or monitor the Provisioner in AWS Systems Manager.
To check that the cluster is healthy and has the needed number of nodes, log in to the Qumulo Core Web UI.

Step 2: Remove the Previous Deployment

To select the previous Terraform workspace (for example, default), run the terraform workspace select default command.
To ensure that the correct workspace is selected, run the terraform workspace show command.
Run the terraform destroy command.

Review the Terraform execution plan and then enter yes.

Terraform displays the Destroy complete! message with a count of destroyed resources.

The previous deployment is deleted.

Note
The persistent storage deployment remains in its original Terraform workspace. You can perform the next cluster replacement procedure in the default workspace.

Step 3: Clean Up S3 Bucket Policies

To list your Terraform workspaces, run the terraform workspace list command.
To select your new Terraform workspace, run the terraform workspace select <my-new-workspace-name> command.
Edit the terraform.tfvars file and set the q_replacement_cluster variable to false.
Run the terraform apply command. This ensures that the S3 bucket policies have least privilege.

Review the Terraform execution plan and then enter yes.

Terraform displays the Apply complete! message with a count of destroyed resources.

Deleting an Existing Cluster

Deleting a cluster is a two-step process:

Delete your cluster’s compute and cache resources.
Delete your persistent storage.

Caution

When you no longer need your cluster, you must back up all important data on the cluster safely before deleting the cluster.
When you delete your cluster's cache and computer resources, it isn't possible to access your persistent storage anymore.

Step 1: To Delete Your Cluster’s Compute and Cache Resources

After you back up your data safely, edit your terraform.tfvars file and set the term_protection variable to false.
Run the terraform apply command.

Review the Terraform execution plan and then enter yes.

Terraform displays the Apply complete! message with a count of changed resources.
Run the terraform destroy command.

Review the Terraform execution plan and then enter yes.

Terraform deletes all of your cluster’s compute and cache resources and displays the Destroy complete! message and a count of destroyed resources.

Step 2: To Delete Your Cluster’s Persistent Storage

Navigate to the persistent-storage directory.
Edit your terraform.tfvars file and set the prevent_destroy parameter to false.
Run the terraform apply command.

Review the Terraform execution plan and then enter yes.

Terraform displays the Apply complete! message with a count of changed resources.
Run the terraform destroy command.

Review the Terraform execution plan and then enter yes.

Terraform deletes all of your cluster’s persistent storage and displays the Destroy complete! message and a count of destroyed resources.

Deploying Cloud Native Qumulo on AWS with Terraform

Prerequisites

How the CNQ Provisioner Works

To Monitor the Provisioner’s Status

Step 1: Deploying Cluster Persistent Storage

Part 1: Prepare the Required Files

Part 2: Create the Necessary Resources

Step 2: Deploying Cluster Compute and Cache Resources

Recommendations

To Deploy the Cluster Compute and Cache Resources

Step 3: Performing Post-Deployment Actions

Adding Nodes to an Existing Cluster

Removing Nodes from an Existing Cluster

Step 1: Remove Nodes from the Cluster’s Quorum

Step 2: Tidy Up AWS Resources for Removed Nodes

Increasing the Soft Capacity Limit for an Existing Cluster

Step 1: Set New Persistent Storage Parameters

Step 2: Update Existing Compute and Cache Resource Deployment

Changing the EC2 Instance Type of Your CNQ on AWS Cluster

Step 1: Create a New Deployment in a New Terraform Workspace

Step 2: Remove the Previous Deployment

Step 3: Clean Up S3 Bucket Policies

Deleting an Existing Cluster

Step 1: To Delete Your Cluster’s Compute and Cache Resources

Step 2: To Delete Your Cluster’s Persistent Storage

Related Topics