This section explains how to deploy Cloud Native Qumulo (CNQ) on AWS by creating the persistent storage and the cluster compute and cache resources with CloudFormation. It also provides information about post-deployment actions and optimization.
For an overview of CNQ on AWS, its prerequisites, and limits, see How Cloud Native Qumulo Works.
The aws-cloudformation-cnq-<x.y>.zip
file (the version in the file name corresponds to the provisioning scripts, not the version of Qumulo Core) contains comprehensive CloudFormation templates that let you deploy S3 buckets and then create a CNQ cluster with 1 or 3–24 instances that adhere to the AWS Well-Architected Framework and have fully elastic compute and capacity.
Prerequisites
This section explains the prerequisites to deploying CNQ on AWS.
-
To allow your Qumulo cluster to report metrics to Qumulo, your AWS VPC must have outbound Internet connectivity through a NAT gateway or a firewall. Your instance shares no file data during this process.
Important
Connectivity to the following endpoints is required for a successful deployment of a Qumulo instance and quorum formation:api.missionq.qumulo.com
api.nexus.qumulo.com
-
To deploy your Qumulo cluster with a VPC S3 gateway, you must configure your VPC to use the S3 gateway VPC endpoint.
-
The following features require specific versions of Qumulo Core:
Feature Minimum Qumulo Core Version - Adding S3 buckets to increase persistent storage capacity
- Increasing the soft capacity limit for an existing CNQ cluster
7.2.1.1 7.2.0.3 Creating persistent storage Important
You must create persistent storage by using a separate CloudFormation stack before you deploy the compute and cache resources for your cluster.7.1.3 with version 4.0 of the deployment scripts -
Before you configure your CloudFormation template, you must sign in to the AWS Management Console.
Important
- Unless you use the
AdministratorAccess
managed IAM policy for your user or role, you can run theiam_tester.py
script in theutilities
directory to validate your IAM role. - For an explicit list of privileges recommended for least-privilege access, see the IAM documentation in the
utilities
directory.
A custom IAM role or user must include the following AWS services:
cloudformation:*
ec2:*
elasticloadbalancing:*
iam:*
kms:*
lambda:*
logs:*
resource-groups:*
route53:*
route53resolver:*
s3:*
secretsmanager:*
sns:*
ssm:*
sts:*
Note
Although theAdministratorAccess
managed IAM policy provides sufficient permissions, your organization might use a custom policy with more restrictions. - Unless you use the
How the CNQ Provisioner Works
The CNQ Provisioner is an m5.large EC2 instance that configures your Qumulo cluster and any additional AWS environment requirements.
Don’t delete the CNQ Provisioner’s EC2 instance. It is necessary for EC2 updates.
To Monitor the Provisioner’s Status
The Provisioner stores all necessary state information in the Parameter Store and shuts down automatically when it completes its tasks.
- In AWS Systems Manager, click Application Management > Parameter Store > /qumulo/<my-unique-deployment-name>/last-run-status.
- On the History tab, click ⚙️.
- In the Preferences dialog box, click Parameter history properties > Value > Confirm.
Step 1: Deploying Cluster Persistent Storage
This section explains how to deploy the S3 buckets that act as persistent storage for your Qumulo cluster.
Part 1: Prepare the Required Files
Before you can deploy the persistent storage for your cluster, you must download and prepare the required files.
-
Log in to Qumulo Nexus and click Downloads > Cloud Native Qumulo Downloads.
-
On the AWS tab, in the Download the required files section, select the Qumulo Core version that you want to deploy and then download the corresponding CloudFormation template and Debian or RPM package.
-
In a new or existing S3 bucket, within your S3 bucket prefix, create the
qumulo-core-install
directory. -
Within this directory, create another directory with the Qumulo Core version as its name. For example:
my-s3-bucket-name/my-s3-bucket-prefix/qumulo-core-install/7.5.0
Tip
Make a new subdirectory for every new release of Qumulo Core. -
Copy
qumulo-core.deb
orqumulo-core.rpm
into the directory named after the Qumulo Core version (in this example, it is7.5.0
). -
Decompress
aws-cloudformation-cnq-<x.y>.zip
locally and copy it to your S3 bucket prefix. -
Find the URL to
templates/persistent-storage.template.yaml
. For example:https://my-bucket.s3.us-west-2.amazonaws.com/my-s3-bucket-prefix/templates/persistent-storage.template.yaml
Tip
Make a new subdirectory for every new release of Qumulo Core.
Part 2: Create the CloudFormation Stack
-
Log in to the AWS CloudFormation console.
-
On the Stacks page, in the upper right, click Create stack > With new resources (standard).
-
On the Create stack page, in the Specify template section, click Amazon S3 URL, enter the URL to
persistent-storage.template.yaml
, and then click Next. -
On the Specify stack details page, take the following steps:
-
Enter a Stack name, for example
my-storage-stack
. -
For S3 bucket name, enter the name of the S3 bucket that you used to prepare your files.
-
For S3 key prefix, enter your S3 bucket prefix.
-
For S3 bucket region, enter the same AWS region as the one for your S3 bucket.
-
Click Next.
-
Enter a Stack name, for example
-
On the Configure stack options page, read and accept the two acknowledgements, and then click Next.
-
On the Review and create page, click Submit.
CloudFormation creates resources for the stack and displays the CREATE_COMPLETE status for each resource.
Step 2: Deploying Cluster Compute and Cache Resources
This section explains how to deploy compute and cache resources for a Qumulo cluster by using a Ubuntu AMI and the Qumulo Core .deb
installer.
Recommendations
We strongly recommend reviewing the following recommendations before beginning this process.
-
Only when the CloudFormation stack finishes running can you begin to monitor the Provisioner. (In AWS Systems Manager, click Application Management > Parameter Store > /qumulo/<my-unique-deployment-name>/last-run-status. On the History tab, click ⚙️, and then in the Preferences dialog box, click Parameter history properties > Value > Confirm.) Until the Provisioner shuts down automatically, the provisioning process isn’t complete and the Qumulo cluster isn’t yet functional.
-
If you plan to deploy multiple Qumulo clusters, give the
q_cluster_name
variable a unique name for each cluster. -
We recommend forwarding DNS queries to Qumulo Authoritative DNS (QDNS). For a single-AZ deployment, to allow Qumulo Core to create an Amazon Route 53 outbound resolver, specify values for the
q_cluster_fqdn
andsecond_private_subnet_id
variables. The resolver uses theq_cluster_fqdn
variable to forward DNS requests to your cluster, where Qumulo Core resolves DNS for your floating IP addresses.
To Deploy the Cluster Compute and Cache Resources
-
Configure your VPC to use the gateway VPC endpoint for S3.
-
In the S3 bucket that hosts your deployment files, find the URL to
templates/cnq-standard.template.yaml
. For example:https://my-bucket.s3.us-west-2.amazonaws.com/my-s3-bucket-prefix/templates/cnq-standard.template.yaml
-
Log in to the AWS CloudFormation console.
-
On the Stacks page, in the upper right, click Create stack > With new resources (standard).
-
On the Create stack page, in the Specify template section, click Amazon S3 URL, enter the URL to
cnq-standard-template.yaml
, and then click Next. -
On the Specify stack details page, take the following steps:
-
In the Provide a stack name section, enter a Stack name, for example
my-compute-cache-stack
. -
In the Parameters section, under Cloud Native Qumulo, take the following steps:
-
For S3 bucket name, enter the name of the S3 bucket that you used to prepare your files.
-
For S3 key prefix, enter your S3 bucket prefix.
-
For S3 bucket region, enter the same AWS region as the one for your S3 bucket.
-
Select an EC2 key pair.
-
For Environment type, select either Dev or Prod.
-
-
Under AWS network configuration, take the following steps:
-
Select a VPC ID.
-
Enter CIDR #1 for the Qumulo security group.
-
(Optional) Enter CIDR #2 for the Qumulo security group.
-
Select the Private subnet ID(s).
-
-
Under Qumulo file data platform configuration, take the following steps:
-
For the Stack name from the persistent storage CloudFormation deployment, enter the name of the stack that you used to create your persistent storage.
-
For Hot or Cold cluster, select an S3 storage class.
-
Select the Qumulo EC2 instance type.
-
Enter the Number of Qumulo EC2 instances.
This number determines the number of nodes in your Qumulo cluster.
-
Enter the Total number of Floating IPs for the Qumulo Cluster.
Tip
If you intend to scale out your Qumulo cluster, enter 6 floating IP addresses for each EC2 instance. -
Enter the Qumulo software version, Qumulo cluster name, and the Qumulo cluster administrator password.
-
-
Click Next.
-
In the Provide a stack name section, enter a Stack name, for example
-
On the Configure stack options page, read and accept the two acknowledgements, and then click Next.
-
On the Review and create page, click Submit.
CloudFormation creates resources for the stack and displays the CREATE_COMPLETE status for each resource.
-
To log in to your cluster’s Web UI, use the endpoint from the the QumuloPrivateIP key on the Outputs tab for this stack and the username and password that you have configured.
Important
If you change the administrative password for your cluster by using the Qumulo Core Web UI,qq
CLI, or REST API after deployment, you must update your password in AWS Secrets Manager.You can use the Qumulo Core Web UI to create and manage NFS exports, SMB shares, snapshots, and continuous replication relationships You can also join your cluster to Active Directory, configure LDAP, and perform many other operations.
-
Mount your Qumulo file system by using NFS or SMB and your cluster’s DNS name or IP address.
Step 3: Performing Post-Deployment Actions
This section describes the common actions you can perform on a CNQ cluster after deploying it.
Adding Nodes to an Existing Cluster
To add nodes to an existing cluster, the total node count must be greater than that of the current deployment.
- Log in to the AWS CloudFormation console.
- On the Stacks page, select your compute and cache deployment stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, enter a new value for Number of Qumulo EC2 instances and then click Next.
- On the Configure stack options page, in the Stack failure options section, click Roll back all stack resources, read and accept the two acknowledgements, and then click Next.
-
On the Review <my-unique-deployment-name> page, click Submit.
CloudFormation creates resources for the stack and displays the CREATE_COMPLETE status for each resource.
- To ensure that the Provisioner shut downs automatically, monitor the
/qumulo/my-unique-deployment-name/last-run-status
parameter for the Provisioner. To monitor the Provisioner’s status, you can watch the Terraform operations in your terminal or monitor the Provisioner in AWS Systems Manager. - To check that the cluster is healthy and has the needed number of nodes, log in to the Qumulo Core Web UI.
Removing Nodes from an Existing Cluster
Removing nodes from an existing cluster is a two-step process:
- Remove the nodes from your cluster’s quorum.
- Tidy up the AWS resources for the removed nodes.
Step 1: Remove Nodes from the Cluster’s Quorum
You must perform this step while the cluster is running.
-
Edit the
terraform.tfvars
file, setting the value ofq_target_node_count
to a reduced number of nodes in the cluster. -
Run the
terraform apply
command. -
Review the nodes to be removed and then enter
yes
.Terraform removes the nodes and displays:
-
The
Apply complete!
message with a count of removed resources -
Your deployment’s unique name
-
The remaining S3 buckets for your Qumulo cluster
-
The primary (static) IP addresses for the node removed from your Qumulo cluster
-
The Qumulo Core Web UI endpoint
For example:
Apply complete! Resources: 0 added, 0 changed, 1 destroyed. Outputs: cluster_provisioned = "Success" deployment_unique_name = "myname-deployment-ABCDE01EG2H" ... persistent_storage_bucket_names = tolist([ "ab5cdefghij-my-deployment-klmnopqr9st-qps-1", "ab4cdefghij-my-deployment-klmnopqr8st-qps-2", "ab3cdefghij-my-deployment-klmnopqr7st-qps-3", ... "ab2cdefghij-my-deployment-klmnopqr6st-qps-16" ]) qumulo_floating_ips = tolist([ "203.0.113.42", "203.0.113.84", ... ]) ... qumulo_primary_ips_removed_nodes = "203.0.113.24", ... qumulo_private_url_node1 = "https://203.0.113.10"
-
Step 2: Tidy Up AWS Resources for Removed Nodes
-
Edit the
terraform.tfvars
file:-
Set the value of the
q_node_count
variable to a reduced number of nodes in the cluster. -
Set the value of the
q_target_node_count
tonull
.
-
-
Run the
terraform apply
command. -
Review the resources to be removed and then enter
yes
. -
To check that the cluster is healthy and has the needed number of nodes, log in to the Qumulo Core Web UI.
Terraform tidies up the resources for removed nodes and displays:
-
The
Apply complete!
message with a count of removed resources -
Your deployment’s unique name
-
The remaining S3 buckets for your Qumulo cluster
-
The remaining floating IP addresses for your Qumulo cluster
-
The remaining primary (static) IP addresses for your Qumulo cluster
-
The Qumulo Core Web UI endpoint
For example:
Apply complete! Resources: 0 added, 0 changed, 66 destroyed. Outputs: cluster_provisioned = "Success" deployment_unique_name = "myname-deployment-ABCDE01EG2H" ... persistent_storage_bucket_names = tolist([ "ab5cdefghij-my-deployment-klmnopqr9st-qps-1", "ab4cdefghij-my-deployment-klmnopqr8st-qps-2", "ab3cdefghij-my-deployment-klmnopqr7st-qps-3", ... "ab2cdefghij-my-deployment-klmnopqr6st-qps-16" ]) qumulo_floating_ips = tolist([ "203.0.113.42", "203.0.113.84", ... ]) ... qumulo_primary_ips = tolist([ "203.0.113.4", "203.0.113.5", "203.0.113.6", "203.0.113.7" ]) ... qumulo_private_url_node1 = "https://203.0.113.10"
-
Increasing the Soft Capacity Limit for an Existing Cluster
Increasing the soft capacity limit for an existing cluster is a two-step process:
- Configure new persistent storage parameters.
- Configure new compute and cache deployment parameters.
Step 1: Set New Persistent Storage Parameters
- On the Stacks page, select your persistent storage stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, select a higher value for Soft Capacity Limit and then click Next.
- On the Configure stack options page, in the Stack failure options section, click Roll back all stack resources, read and accept the two acknowledgements, and then click Next.
-
On the Review <my-unique-deployment-name> page, click Submit.
CloudFormation updates resources for the stack and displays the CREATE_COMPLETE status for each resource.
- To check that the cluster is healthy and has the needed number of nodes, log in to the Qumulo Core Web UI.
Step 2: Update Existing Compute and Cache Resource Deployment
- On the Stacks page, select your compute and cache deployment stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page click Next.
- On the Configure stack options page, in the Stack failure options section, click Roll back all stack resources, read and accept the two acknowledgements, and then click Next.
-
On the Review <my-unique-deployment-name> page, click Submit.
CloudFormation updates resources for the stack and displays the CREATE_COMPLETE status for each resource.
When the Provisioner shuts down automatically, this process is complete.
Changing the EC2 Instance Type of Your CNQ on AWS Cluster
You can change the EC2 instance type, node count, and to convert your cluster from single-AZ to multi-AZ, or the other way around.
- To minimize potential availability interruptions, you must perform the cluster replacement procedure as a two-quorum event. For example, if you stop the existing EC2 instances by using the AWS Management Console and change the EC2 instance types, two quorum events occur for each node and the read and write cache isn't optimized for the EC2 instance type.
- Performing the cluster replacement procedure ensures that the required EC2 instance types are available in advance.
Changing the EC2 instance type of your CNQ on AWS cluster is a three-step process:
- Create a new deployment in a new CloudFormation stack and join the new EC2 instances to a quorum.
- Remove the existing EC2 instances.
- Clean up your S3 bucket policies.
Step 1: Create a New CloudFormation Stack
-
Log in to the AWS CloudFormation console.
-
On the Stacks page, in the upper right, click Create stack > With new resources (standard).
-
On the Create stack page, in the Specify template section, click Amazon S3 URL, enter the URL to your CloudFormation template, and then click Next.
-
On the Specify stack details page, take the following steps:
-
In the Provide a stack name section, enter a Stack name, for example
my-compute-cache-replacement-stack
. -
In the Parameters section, under Cloud Native Qumulo, take the following steps:
-
For S3 bucket name, enter the name of the S3 bucket that you used to prepare your files.
-
For S3 key prefix, enter your S3 bucket prefix.
-
For S3 bucket region, enter the same AWS region as the one for your S3 bucket.
-
Select an EC2 key pair.
-
For Environment type, select either Dev or Prod.
-
-
Under AWS network configuration, take the following steps:
-
Select a VPC ID.
-
Enter CIDR #1 for the Qumulo security group.
-
(Optional) Enter CIDR #2 for the Qumulo security group.
-
Select the Private subnet ID(s).
-
-
Under Qumulo file data platform configuration, take the following steps:
-
For the Stack name from the persistent storage CloudFormation deployment, enter the name of the stack that you used to create your persistent storage.
-
For Hot or Cold cluster, select an S3 storage class.
-
Select the Qumulo EC2 instance type.
-
Enter the Number of Qumulo EC2 instances.
This number determines the number of nodes in your Qumulo cluster.
-
Enter the Total number of Floating IPs for the Qumulo Cluster.
Tip
If you intend to scale out your Qumulo cluster, enter 6 floating IP addresses for each EC2 instance. -
For Replacement Cluster, select Yes.
-
For Existing Deployment CloudFormation Stack Name, enter the current stack name, for example,
my-compute-cache-stack
. -
Enter the Qumulo software version, Qumulo cluster name, and the Qumulo cluster administrator password.
-
-
Click Next.
-
-
On the Configure stack options page, read and accept the two acknowledgements, and then click Next.
-
On the Review and create page, click Submit.
- To ensure that the Provisioner shut downs automatically, monitor the
/qumulo/my-unique-deployment-name/last-run-status
parameter for the Provisioner. To monitor the Provisioner’s status, you can watch the Terraform operations in your terminal or monitor the Provisioner in AWS Systems Manager. - To check that the cluster is healthy and has the needed number of nodes, log in to the Qumulo Core Web UI.
Step 2: Remove the Previous Deployment
- To delete the previous CloudFormation stack, on the Stacks page, select the stack name for your previous deployment and then, in the upper right, click Stack actions > Edit termination protection.
- In the Edit termination protection for <stack-name>? dialog box, under Termination protection, click Deactivated and then click Save.
- On the Stacks page, select the stack name for your previous deployment and then, in the upper right, click Delete.
- In the Delete stack? dialog box, click Delete.
-
To ensure that the stack is deleted correctly, watch the deletion process.
The previous deployment is deleted.
The persistent storage deployment remains in its original CloudFormation stack. You can perform the next cluster replacement procedure in the original CloudFormation stack.
Step 3: Clean Up S3 Bucket Policies
- On the Stacks page, select the newly created stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, for Replacement Cluster, click No.
- On the Configure stack options page, in the Stack failure options section, click Roll back all stack resources, read and accept the two acknowledgements, and then click Next.
-
On the Review <my-unique-deployment-name> page, click Submit.
CloudFormation updates resources for the stack and displays the CREATE_COMPLETE status for each resource.
Deleting an Existing Cluster
Deleting a cluster is a two-step process:
- Delete your cluster’s compute and cache resources.
- Delete your persistent storage.
- When you no longer need your cluster, you must back up all important data on the cluster safely before deleting the cluster.
- When you delete your cluster's cache and computer resources, it isn't possible to access your persistent storage anymore.
- Back up your data safely.
- Disable termination protection for your CloudFormation stack.
- To update your stack, do the following:
- On the Stacks page, select the existing stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, click Next.
- On the Configure stack options page, read and accept the two acknowledgements, and then click Next.
- On the Review <my-unique-deployment-name> page, click Rollback on failure and then click Submit.
- Delete your CloudFormation stack.