This section explains how to use Shift-To to copy objects from a directory in a Qumulo cluster to a folder in an Amazon Simple Storage Service (Amazon S3) bucket and how to manage Shift relationships.
For more information about copying objects from S3 to Qumulo, see Using Qumulo Shift-From for Amazon S3 to Copy Objects.
Prerequisites
-
A Qumulo cluster with:
-
Qumulo Core 3.2.1 (and higher) for the CLI and 3.2.5 (and higher) for the Qumulo Core Web UI
-
HTTPS connectivity to
s3.<region>.amazonaws.com
though one of the following means:-
Public Internet
For more information, see AWS IP address ranges in the AWS General Reference.
-
-
-
Membership in a Qumulo role with the following privileges:
-
PRIVILEGE_REPLICATION_OBJECT_WRITE
: This privilege is required to create a Shift relationship. -
PRIVILEGE_REPLICATION_OBJECT_READ
: This privilege is required to view the status of a Shift relationship.
Note
- For any changes to take effect, user accounts with newly assigned roles must log out and log back in (or their sessions must time out).
- Use special care when granting privileges to roles and users because certain privileges (such as replication-write privileges) can use system privileges to overwrite or move data to a location where a user has greater permissions. This can give a user access to all directories and files in a cluster regardless of any specific file and directory settings.
-
-
An existing bucket with contents in Amazon S3
-
AWS credentials (access key ID and secret access key) with the following permissions:
-
s3:AbortMultipartUpload
-
s3:GetObject
-
s3:PutObject
-
s3:PutObjectTagging
-
s3:ListBucket
For more information, see Understanding and getting your AWS credentials in the AWS General Reference
-
Example IAM Policy
In the following example, the IAM policy gives permission to read from and write to the my-folder
folder in the my-bucket
. This policy can give users the permissions required to run Shift-To jobs.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "s3:ListBucket",
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-bucket"
},
{
"Action": [
"s3:AbortMultipartUpload",
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectTagging"
]
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-bucket/my-folder/*"
}
]
}
How Shift-To Relationships Work
Qumulo Core performs the following steps when it creates a Shift-To relationship.
-
Verifies that the directory exists on the Qumulo cluster and that the specified S3 bucket exists, is accessible by using the specified credentials, and contains downloadable objects.
-
Creates the Shift-To relationship.
-
Starts a job by using one of the nodes in the Qumulo cluster.
Note
If you perform multiple Shift operations, Qumulo Core uses multiple nodes. -
To ensure that the copy is point-in-time consistent, takes a temporary snapshot of the directory (for example, named
replication_to_bucket_my_bucket
). -
Recursively traverses the directories and files in the snapshots and copies each object to a corresponding object in S3.
-
Preserves the file paths in the local directory in the keys of replicated objects.
For example, the file
/my-dir/my-project/file.text
, wheremy-dir
is the directory on your Qumulo cluster, is uploaded to S3 as the following object, wheremy-folder
is the specified S3 folder.https://my-bucket.s3.us-west-2.amazonaws.com/my-folder/my-project/file.txt
Note
This process doesn’t encode or transform your data in any way. Shift-To replicates only the data in a regular file’s primary stream, excluding alternate data streams and file system metadata such as access control lists (ACLs). To avoid transferring data across the public Internet, a server-side S3 copy operation also copies any hard links to files in the replication local directory to S3 as full copies of objects, with identical contents and metadata. -
Checks whether a file is already replicated. If the object exists in the remote S3 bucket, and neither the file nor the object are modified since the last successful replication, its data isn’t retransferred to S3.
Note
Shift never deletes files in the remote S3 folder, even if the files are removed from the local directory since the last replication. -
Deletes the temporary snapshot.
Storing and Reusing Relationships
The Shift-To relationship remains on the Qumulo cluster. You can monitor the completion status of a job, start new jobs for a relationship after the initial job finishes, and delete the relationship (when you no longer need the S3-folder-Qumulo-directory pair). To avoid reuploading objects that a previous copy job uploaded, relationships take up approximately 100 bytes for each object. To free this storage, you can delete relationships that you no longer need.
If you repeatedly copy from the same Qumulo directory, you can speed up the upload process (and skip already uploaded files) by using the same relationship.
A new relationship for subsequent uploads doesn’t share any tracking information with previous relationships associated with a directory and might recopy data that is already uploaded.
How Entities in the Qumulo File System are Represented in an S3 Bucket
This section explains which entity types Qumulo Core doesn’t copy to an S3 bucket and how an S3 bucket represents the entities that Qumulo Core copies to an S3 bucket.
Entity Types that Qumulo Core Doesn’t Copy
-
Access control list (ACL)
-
Alternate data stream
-
Directory
Note
For objects created for files, the system preserves the directory structure in the object key. -
Hard link to a non-regular file
-
SMB extended file attribute
-
Symbolic link
-
Timestamp (
mtime
,ctime
,atime
,btime
) -
UNIX device file
Entity Types that Qumulo Core Copies
Entity in the Qumulo File System | Representation in an Amazon S3 Bucket |
---|---|
Hard link to a regular file | Copy of the S3 object |
Generic user metadata | S3 tags |
Hole in sparse files |
Zero
Note
The system expands any holes. |
Regular file |
S3 object
Note
The object key is the file system path and the object value is the metadata. |
S3 Metadata | Object metadata |
Using the Qumulo Core Web UI to Copy Files and Manage Relationships
This section describes how to use the Qumulo Core Web UI 3.2.5 (and higher) to copy files from a Qumulo cluster to Amazon S3, review Shift relationship details, stop a running copy job, repeat a completed copy job, and delete a relationship.
To Copy Files to Amazon S3
-
Log in to the Qumulo Core Web UI.
-
Click Cluster > Copy to/from S3.
-
On the Copy to/from S3 page, click Create Copy.
-
On the Create Copy to/from S3 page, click Local ⇨ Remote and then enter the following:
-
The Directory Path on your cluster (
/
by default) -
The S3 Bucket Name
-
The Folder in your S3 bucket
-
The Region for your S3 bucket
-
Your AWS Region (
/
by default) -
Your AWS Access Key ID and Secret Access Key.
-
-
(Optional) For additional configuration, click Advanced S3 Server Settings.
-
Click Create Copy.
-
In the Create Copy to S3? dialog box, review the Shift relationship and then click Yes, Create.
The copy job begins.
To View Configuration Details and Status of Shift Relationships
- Log in to the Qumulo Core Web UI.
-
Click Cluster > Copy to/from S3.
The Copy to/from S3 page lists all existing Shift relationships.
-
To get more information about a specific Shift relationship, click ⋮ > View Details.
The Copy to/from S3 Details page displays the following information:
- Throughput: average
- Run Time
- Data: total, transferred, and unchanged
- Files: total, transferred, and unchanged
To Stop a Copy Job in Progress
- Log in to the Qumulo Core Web UI.
- Click Cluster > Copy to/from S3.
- To stop a copy job for a specific relationship, click ⋮ > Abort.
-
In the Abort copy from? dialog box, review the Shift relationship and then click Yes, Abort.
The copy job stops.
To Repeat a Completed Copy Job
- Log in to the Qumulo Core Web UI.
- Click Cluster > Copy to/from S3.
- To stop a copy job for a specific relationship, click ⋮ > Copy Again.
-
In the Copy again? dialog box, review the Shift relationship and then click Yes, Copy Again.
The copy job repeats.
To Delete a Shift Relationship
- Log in to the Qumulo Core Web UI.
- Click Cluster > Copy to/from S3.
- To stop a copy job for a specific relationship, click ⋮ > Delete.
-
In the Delete copy from? dialog box, review the Shift relationship and then click Yes, Delete.
The copy job is deleted.
Using the Qumulo CLI to Copy Files and Manage Relationships
This section describes how to use the Qumulo CLI 3.2.5 (and higher) to copy files from a Qumulo cluster to Amazon S3, review Shift relationship details, stop a running copy job, repeat a completed copy job, and delete a relationship.
Copying Files from Amazon S3
To copy files, run the qq replication_create_object_relationship
command and specify the following:
- Local directory path on Qumulo cluster
- Copy direction (copy-to)
- S3 object folder
- S3 bucket
- AWS region
- AWS access key ID
- AWS secret access key
The following example shows how to create a relationship between the directory /my-dir/
on a Qumulo cluster and the S3 bucket my-bucket
and folder /my-folder/
in the us-west-2
AWS region. The secret access key is associated with the access key ID.
qq replication_create_object_relationship \
--source-directory-path /my-dir/ \
--direction COPY_TO_OBJECT \
--object-folder /my-folder/ \
--bucket my-bucket \
--region us-west-2 \
--access-key-id AKIAIOSFODNN7EXAMPLE \
--secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
The CLI returns the details of the relationship in JSON format, for example:
{
"access_key_id": "ABC",
"bucket": "my-bucket",
"object_store_address": "s3.us-west-2.amazonaws.com",
"id": "1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78",
"object_folder": "my-folder/",
"port": 443,
"ca_certificate": null,
"region": "us-west-2",
"source_directory_id": "3",
"direction": "COPY_TO_OBJECT",
}
Viewing Configuration Details and Status of Shift Relationships
-
To view configuration details for all Shift relationships, run the
qq replication_list_object_relationships
command. -
To view configuration details for a specific relationship, run the
qq replication_get_object_relationship
command followed by the--id
and the Shift relationship ID (GUID), for example:qq replication_get_object_relationship --id 1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78
-
To view the status of a specific relationship, run the
qq replication_get_object_relationship_status
command followed by the--id
and the Shift relationship ID. -
To view the status of all relationships, run the
qq replication_list_object_relationship_statuses
command.The CLI returns the details of all relationships in JSON format, for example:
[ { "direction": "COPY_TO_OBJECT", "access_key_id": "AKIAIOSFODNN7EXAMPLE", "bucket": "my-bucket", "object_store_address": "s3.us-west-2.amazonaws.com", "id": "1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78", "object_folder": "my-folder/", "port": 443, "ca_certificate": null, "region": "us-west-2", "source_directory_id": "3", "source_directory_path": "/my-dir/", "state": "REPLICATION_RUNNING", "current_job": { "start_time": "2020-04-06T17:56:29.659309904Z", "estimated_end_time": "2020-04-06T21:54:33.244095593Z", "job_progress": { "bytes_transferred": "178388608", "bytes_unchanged": "0", "bytes_remaining": "21660032", "bytes_total": "200048640", "files_transferred": "17", "files_unchanged": "0", "files_remaining": "4", "files_total": "21", "percent_complete": 89.0368314738253, "throughput_current": "12330689", "throughput_overall": "12330689" } }, "last_job": null } ]
The
state
field shows theREPLICATION_RUNNING
status and thecurrent_job
field shows the job’s progress. When Qumulo Core copies files from S3, details for the most recently completed job become available in thelast_job
field, thestate
field changes toREPLICATION_NOT_RUNNING
, and thecurrent_job
field reverts tonull
.Note
If you already ran a job for a relationship, it is possible for both thecurrent_job
andlast_job
fields to be non-null while you run a new job.The
bytes_total
andfiles_total
fields represent the total amount of data and number of files to be transferred by a Shift job. Thebytes_remaining
andfiles_remaining
fields show the amount of data and number of files not yet transferred. The values of these four fields don’t stabilize until the work estimation for the job is complete.The
percent_complete
field displays the overall job progress and theestimated_end_time
field displays the time at which the job is estimated to be complete. The values of these two fields are populated when the work estimation for the job is complete.
Stopping a Copy Job in Progress
To stop a copy job already in progress, run the qq replication_abort_object_replication
command and use the --id
flag to specify the Shift relationship ID.
Repeating a Completed Copy Job
To repeat a completed copy job, run the qq replication_start_object_relationship
command and use the --id
flag to specify the Shift relationship ID.
This command begins a new job for the existing relationship and downloads any content that changed in the S3 bucket or on the Qumulo cluster since the time the previous job ran.
Deleting a Shift Relationship
After your copy job is complete, you can delete your Shift relationship. To do this, run the replication_delete_object_relationship
command and use the --id
flag to specify the Shift relationship ID.
You can run this command only against a relationship that doesn’t have any active jobs running.
This command removes the copy job’s record, leaving locally stored objects unchanged. Any storage that the relationship used to track downloaded objects becomes available when you delete the relationship.
Troubleshooting Copy Job Issues
Any fatal errors that occur during a copy job cause the job to fail, leaving a partially copied set of files in the directory in your S3 bucket. However, to let you review the Shift relationship status any failure messages, the Shift relationship continues to exist. You can start a new job to complete the copying of objects to the S3 bucket—any successfully transferred files from the previous job aren’t retransferred from your Qumulo cluster.
Whenever Qumulo Core doesn’t complete an operation successfully and returns an error from the API or CLI, the error
field within the last_job
field (that the replication_list_object_relationship_statuses
command returns) contains a detailed failure message. For more troubleshooting details, see qumulo-replication.log
on your Qumulo cluster.
Best Practices for Shift-to-S3
We recommend the following best practices for working with Qumulo Shift-To for Amazon S3.
- Bucket Lifecycle Policy: To abort any incomplete uploads older than several days and ensure the automatic clean-up of any storage that incomplete parts of large objects (left by failed or interrupted replication operations) use, configure a bucket lifecycle policy. For more information, see Uploading and copying objects using multipart upload in the Amazon Simple Storage Service User Guide.
- VPC Endpoints: For best performance when using a Qumulo cluster in AWS, configure a VPC endpoint to S3. For on-premises Qumulo clusters, we recommend AWS Direct Connect or another high-bandwidth, low-latency connection to S3.
- Unique Artifacts: To avoid collisions between different data sets, specify a unique object folder or unique bucket for each replication relationship from a Qumulo cluster to S3.
- Object Versioning: To protect against unintended overwrites, enable object versioning. For more information, see Using versioning in S3 buckets in the Amazon Simple Storage Service User Guide.
- Completed Jobs: If you don’t plan to use a Shift relationship to download updates from S3, delete the relationship to free up any storage associated with it.
- Concurrent Replication Relationships: To increase parallelism, especially across distinct datasets, use concurrent replication relationships to S3. To avoid having a large number of concurrent operations impact client I/O to the Qumulo cluster, limit the number of concurrent replication relationships. While there is no hard limit, we don’t recommend creating more than 100 concurrent replication relationships on a cluster (including both Shift and Qumulo local replication relationships).
- User Metadata Limits: Amazon S3’s limits on object metadata (up to 2 kB across key bytes and value bytes) and tagging (10 entries with a key size of 128 bytes and a value size of 256 bytes) are more restrictive than those of Qumulo Core. When a metadata entry exceeds one of these limits, Qumulo Core omits the entry from a replication job. For more information, see User Defined Object Metadata and Categorizing your storage using tags in the Amazon Simple Storage Service User Guide.
Shift-to-S3 Restrictions
- Object-Locked Buckets: You can’t use buckets configured with S3 Object Lock and a default retention period for Shift-To. If possible, either remove the default retention period and set retention periods explicitly on objects uploaded outside of Shift or use a different S3 bucket without S3 Object Lock enabled. For more information, see How S3 Object Lock works in the Amazon Simple Storage Service User Guide.
- File Size Limit: The size of an individual file can’t exceed 5 TiB (this is the maximum object size that S3 supports). There is no limit on the total size of all your files.
- File Path Limit: The length of a file path must be shorter than 1,024 characters, including the configured object folder prefix, excluding the local directory path.
- Hard Links: Qumulo Core 3.2.3 (and higher) supports hard links, up to the maximum object size that S3 supports.
- Objects Under the Same Key: Unless an object contains Qumulo-specific hash metadata that matches a file, any object that exists under the same key that a new relationship replicates is overwritten. To retain older versions of overwritten objects, enable versioning for your S3 bucket. For more information, see Using versioning in S3 buckets in the Amazon Simple Storage Service User Guide.
- Object Checksums: All files replicated by using S3 server-side integrity verification (during upload) use a SHA256 checksum stored in the replicated object’s metadata.
- S3-Compatible Object Stores: S3-compatible object stores aren’t supported. Currently, Qumulo Shift-To supports replication only to Amazon S3.
- HTTP: HTTP isn’t supported. All Qumulo connections are encrypted by using HTTPS and verify the S3 server’s SSL certificate.
- Anonymous Access: Anonymous access isn’t supported. You must use valid AWS credentials.
- Replication without Throttling: Replication provides no throttling and might use all available bandwidth. If necessary, use Quality of Service rules on your network.
- Amazon S3 Standard Storage Class: Qumulo Shift-To supports uploading only objects stored in the Amazon S3 Standard storage class. You can’t download objects stored in the Amazon S3 Glacier or Deep Archive storage classes and any buckets that contain such objects cause a copy job to fail.
- Content-Type Metadata: Because all objects are stored in S3 using the default
binary/octet-stream
content type, they might be interpreted as binary data if you download them by using a browser. To attach content-type metadata to your objects, use the AWS Console.