This section explains how to deploy Cloud Native Qumulo (CNQ) on AWS by creating the persistent storage and the cluster compute and cache resources with CloudFormation. It also provides information about post-deployment actions and optimization.
For an overview of CNQ on AWS, its prerequisites, and limits, see How Cloud Native Qumulo Works.
The aws-cloudformation-cnq-<x.y>>.zip
file (the version in the file name corresponds to the provisioning scripts, not the version of Qumulo Core) contains comprehensive CloudFormation templates that let you deploy S3 buckets and then create a CNQ cluster with 4 to 24 instances that adhere to the AWS Well-Architected Framework and have fully elastic compute and capacity.
Prerequisites
This section explains the prerequisites to deploying CNQ on AWS.
-
To allow your Qumulo instance to report metrics to Qumulo, your AWS VPC must have outbound Internet connectivity through a NAT gateway or a firewall. Your instance shares no file data during this process.
Important
Connectivity to the following endpoints is required for a successful deployment of a Qumulo instance and quorum formation:api.missionq.qumulo.com
api.nexus.qumulo.com
-
The following features require specific versions of Qumulo Core:
Feature Minimum Qumulo Core Version - Adding S3 buckets to increase persistent storage capacity
- Increasing the soft capacity limit for an existing CNQ cluster
7.2.1.1 7.2.0.2 Creating persistent storage Important
You must create persistent storage by using a separate CloudFormation stack before you deploy the compute and cache resources for your cluster.7.1.3 with version 4.0 of the deployment scripts -
Before you configure your CloudFormation template, you must sign in to the AWS Management Console.
A custom IAM role or user must include the following AWS services:
cloudformation:*
ec2:*
elasticloadbalancing:*
iam:*
kms:*
lambda:*
logs:*
resource-groups:*
route53:*
s3:*
secretsmanager:*
sns:*
ssm:*
sts:*
Note
Although theAdministratorAccess
managed IAM policy provides sufficient permissions, your organization might use a custom policy with more restrictions.
How the CNQ Provisioner Works
The CNQ Provisioner is an m5.large EC2 instance that configures your Qumulo cluster and any additional AWS environment requirements.
Don’t delete the CNQ Provisioner’s EC2 instance. It is necessary for EC2 updates.
The Provisioner stores all necessary state information in AWS Systems Manager (Application Management > Parameter Store) and shuts down automatically when it completes its tasks.
Step 1: Deploying Cluster Persistent Storage
This section explains how to deploy the S3 buckets that act as persistent storage for your Qumulo cluster.
-
Log in to Nexus and click Downloads > Cloud Native Qumulo Downloads.
-
Click the AWS tab and, in the Download the required files section, select the Qumulo Core version that you want to deploy and then download the corresponding CloudFormation template, Debian package, and host configuration file.
-
In your S3 bucket, create the
qumulo-core-install
directory. Within this directory, create another directory with the Qumulo Core version as its name. The following is an example path:my-s3-bucket-name/my-s3-bucket-prefix/qumulo-core-install/7.2.3.1
Tip
Make a new subdirectory for every new release of Qumulo Core. -
Copy
qumulo-core.deb
andhost_configuration.tar.gz
into the directory named after the Qumulo Core version (in this example, it is7.2.3.1
). -
Copy
aws-cloudformation-cnq-<x.y>.zip
to themy-s3-bucket-name/my-s3-bucket-prefix/aws-cloudformation-cnq
directory. and decompress it. -
Clone the
aws-cloudformation-cnq-<x.y>>.zip
file (the version in the file name corresponds to the provisioning scripts, not the version of Qumulo Core) to an S3 bucket and find the URL totemplates/persistent-storage.template.yaml
. For example:https://my-bucket.s3.us-west-2.amazonaws.com/aws-cloudformation-cnq/templates/persistent-storage.template.yaml
Tip
Make a new subdirectory for every new release of Qumulo Core. -
Log in to the AWS CloudFormation console.
-
On the Stacks page, in the upper right, click Create stack > With new resources (standard).
-
On the Create stack page, in the Specify template section, click Amazon S3 URL, enter the URL to
persistent-storage.template.yaml
, and then click Next. -
On the Specify stack details page, enter the Stack name and review the information in the Parameters section:
-
Enter the S3 bucket Region.
-
Select the Soft Capacity Limit for the subsequent CNQ deployment.
-
Click Next.
-
-
On the Configure stack options page, click Next.
-
On the Review and create page, click Submit.
CloudFormation creates S3 buckets and their stack.
Step 2: Deploying Cluster Compute and Cache Resources
This section explains how to deploy compute and cache resources for a Qumulo cluster by using a Ubuntu AMI and the Qumulo Core .deb
installer.
- Only when the CloudFormation stack finishes running should you begin to monitor in AWS Systems Manager (Application Management > Parameter Store) for the
last-ran-status
value for the provisioning instance. Until the provisioning instance shuts down automatically, the provisoning process isn't complete and the Qumulo cluster isn't yet functional. - If you plan to deploy multiple Qumulo clusters, give the
q_cluster_name
variable a unique name for each cluster. - (Optional) If you use Amazon Route 53 private hosted zones, give the
q_fqdn_name
variable a unique name for each cluster
-
Configure your VPC to use the gateway VPC endpoint for S3.
Important
It isn’t possible to deploy your cluster without a gateway. -
Log in to the AWS CloudFormation console.
-
On the Stacks page, in the upper right, click Create stack > With new resources (standard).
-
On the Create stack page, in the Specify template section, click Amazon S3 URL, enter the URL to
cnq-standard-template.yaml
, and then click Next. -
On the Specify stack details page, enter the Stack name and review the information in the Parameters section, and then click Next.
-
On the Configure stack options page, click Next.
-
On the Review and create page, click Submit.
CloudFormation creates S3 buckets and their stack.
-
To log in to your cluster’s Web UI, use the endpoint from the top-level stack output as the endpoint and the username and password that you have configured during deployment as the credentials.
Important
If you change the administrative password for your cluster by using the Qumulo Core Web UI,qq
CLI, or REST API after deployment, you must add your new password to AWS Secrets Manager.You can use the Web UI to create and manage NFS exports, SMB shares, snapshots, and continuous replication relationships You can also join your cluster to Active Directory, configure LDAP, and perform many other operations.
-
Mount your Qumulo file system by using NFS or SMB and your cluster’s DNS name or IP address.
Step 3: Performing Post-Deployment Actions
This section describes the common actions you can perform on a CNQ cluster after deploying it.
Adding a Node to an Existing Cluster
To add a node to an existing cluster, the total node count must be greater than that of the current deployment.
- Log in to the AWS CloudFormation console.
- On the Stacks page, select your compute and cache deployment stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, enter a new value for Node Count and then click Next.
- On the Configure stack options page, click Next.
-
On the Review <my-stack-name> page, click Rollback on failure and then click Submit.
- To ensure that the Provisioner shut downs automatically, review the
/qumulo/my-stack-name/last-run-status
parameter in AWS Systems Manager (Application Management > Parameter Store). - To check that the cluster is healthy, log in to the Web UI.
Removing a Node from an Existing Cluster
Removing a node from an existing cluster is a two-step process:
- Remove the node from your cluster’s quorum.
- Tidy up your AWS resources.
Step 1: Remove the Node from the Cluster’s Quorum
You must perform this step while the cluster is running.
After you remove nodes from your cluster, you must clean up these nodes’ cloud infrastructure by using CloudFormation or Terraform.
-
Copy the
remove-nodes.sh
script from theaws-terraform-cnq-<x.y>/utilities
directory to a machine running in your VPC that has the AWS CLI tools installed (for example, an AWS Linux 2 AMI).Tip
- To make the script executable, run the
chmod +x remove-nodes.sh
command. - To see a list of required parameters, run
remove-nodes.sh
- To make the script executable, run the
-
Run the
remove-nodes.sh
script and specify the AWS region, the unique deployment name, the current node count, and the final node count.In the following example, we reduce a cluster from 6 to 4 nodes.
./remove-nodes.sh \ --region us-west-2 \ --qstackname my-unique-deployment-name \ --currentnodecount 6 \ --finalnodecount 4
-
Review the nodes to be removed and then enter
y
. -
Enter the administrator password for your cluster.
The script removes the nodes and displays:
-
Confirmation that your cluster formed a new quorum
-
Confirmation that the new quorum is active
-
The new total number of nodes in the quorum
-
The EC2 identifiers for the removed nodes
-
The endpoint for your cluster’s Web UI
{"monitor_uri": "/v1/node/state"} --Waiting for new quorum --New quorum formed --Quorum is ACTIVE --Validating quorum --4 Nodes in Quorum --REMOVED: EC2 ID=i-0ab12345678c9012d >> Qumulo node_id=5 --REMOVED: EC2 ID=i-9dc87654321b0987a >> Qumulo node_id=6 **Verify the cluster is healthy in the Qumulo UI at https://203.0.113.10 ...
-
-
To check that the cluster is healthy, log in to the Web UI.
Step 2: Tidy Up Your AWS Resources
- On the Stacks page, select your compute and cache deployment stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, enter a lower value for Node Count (for example,
4
) and then click Next. - On the Configure stack options page, click Next.
-
On the Review <my-stack-name> page, click Rollback on failure and then click Submit.
The node and the infrastructure associated with the node are removed.
- To check that the cluster is healthy, log in to the Web UI.
Increasing the Soft Capacity Limit for an Existing Cluster
Increasing the soft capacity limit for an existing cluster is a two-step process:
- Configure new persistent storage parameters.
- Configure new compute and cache deployment parameters.
Step 1: Set New Persistent Storage Parameters
- On the Stacks page, select your persistent storage stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, enter a higher value for QSoftCapacityLimit and then click Next.
- On the Configure stack options page, click Next.
-
On the Review <my-stack-name> page, click Rollback on failure and then click Submit.
CloudFormation creates new S3 buckets as necessary.
Step 2: Update Existing Compute and Cache Resource Deployment
- On the Stacks page, select your compute and cache deployment stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page click Next.
- On the Configure stack options page, click Next.
-
On the Review <my-stack-name> page, click Rollback on failure and then click Submit.
CloudFormation updates the necessary IAM roles and S3 bucket policies, adds S3 buckets to the persistent storage list for the cluster, and increases the soft capacity limit. When the Provisioner shuts down automatically, this process is complete.
Scaling Your Existing CNQ on AWS Cluster
To minimize potential availability interruptions, you must perform this cluster replacement procedure as a two-quorum event. For example, if you stop the existing EC2 instances by using the AWS Management Console and change the EC2 instance types, two quorum events occur for each node and the read and write cache isn’t optimized for the EC2 instance type.
You can scale an existing CNQ on AWS cluster by changing the EC2 instance type. This is a three-step process:
- Create a new deployment in a new CloudFormation stack and join the new EC2 instances to a quorum.
- Remove the existing EC2 instances.
- Clean up your S3 bucket policies.
Step 1: Create a New Deployment in a New Terraform Workspace
- Log in to the AWS CloudFormation console.
- On the Stacks page, in the upper right, click Create stack > With new resources (standard).
- On the Create stack page, in the Specify template section, click Amazon S3 URL, enter the URL to your CloudFormation template, and then click Next.
-
On the Specify stack details page, to use the same S3 buckets as before, enter the same Stack name as the one you used for the persistent storage stack name and then review the information in the Parameters section:
- For QReplacementCluster, click Yes.
- For QExistingDeploymentUniqueName, enter the current stack name.
- For QInstanceType, enter the EC2 instance type.
- (Optional) To change the number of nodes, enter the QNodeCount
- Click Next.
-
On the Configure stack options page, click Next.
-
On the Review and create page, click Submit.
- To ensure that the Provisioner shut downs automatically, review the
/qumulo/my-stack-name/last-run-status
parameter in AWS Systems Manager (Application Management > Parameter Store). - To check that the cluster is healthy, log in to the Web UI.
Step 2: Remove the Existing EC2 Instances
- To delete the previous CloudFormation stack, on the Stacks page, select the stack name for your previous deployment and then, in the upper right, click Delete.
-
To ensure that the stack is deleted correctly, watch the deletion process.
The previous EC2 instances are deleted.
The persistent storage deployment remains in its original CloudFormation stack. You can perform the next cluster replacement procedure in the original CloudFormation stack.
Step 3: Clean Up S3 Bucket Policies
- On the Stacks page, select the newly created stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, for QReplacementCluster, click No.
- On the Configure stack options page, click Next.
- On the Review <my-stack-name> page, click Rollback on failure and then click Submit.
Deleting an Existing Cluster
Deleting a cluster is a two-step process:
- Delete your Cloud Native Qumulo resources.
- Delete your persistent storage.
- When you no longer need your cluster, you must back up all important data on the cluster safely before deleting the cluster.
- When you delete your cluster's cache and computer resources, it isn't possible to access your persistent storage anymore.
- Back up your data safely.
- Disable termination protection for your CloudFormation stack.
- To update your stack, do the following:
- On the Stacks page, select the existing stack and then, in the upper right, click Update.
- On the Update stack page, click Use existing template and then click Next.
- On the Specify stack details page, click Next.
- On the Configure stack options page, click Next.
- On the Review <my-stack-name> page, click Rollback on failure and then click Submit.
- Delete your CloudFormation stack.