This section explains how to use Shift-To to copy objects from a directory in a Qumulo cluster to a folder in an Amazon Simple Storage Service (Amazon S3) bucket and how to manage Shift relationships.

For more information about copying objects from S3 to Qumulo, see Using Qumulo Shift-From for Amazon S3 to Copy Objects.

Prerequisites

  • A Qumulo cluster with:

    • Qumulo Core 3.2.1 (and higher) for the CLI and 3.2.5 (and higher) for the Qumulo Core Web UI

    • HTTPS connectivity to s3.<region>.amazonaws.com though one of the following means:

      For more information, see AWS IP address ranges in the AWS General Reference.

  • Membership in a Qumulo role with the following privileges:

    • PRIVILEGE_REPLICATION_OBJECT_WRITE: This privilege is required to create a Shift relationship.

    • PRIVILEGE_REPLICATION_OBJECT_READ: This privilege is required to view the status of a Shift relationship.

  • An existing bucket with contents in Amazon S3

  • AWS credentials (access key ID and secret access key) with the following permissions:

    • s3:AbortMultipartUpload

    • s3:GetObject

    • s3:PutObject

    • s3:ListBucket

    For more information, see Understanding and getting your AWS credentials in the AWS General Reference

Example IAM Policy

In the following example, the IAM policy gives permission to read from and write to the my-folder folder in the my-bucket. This policy can give users the permissions required to run Shift-To jobs.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "s3:ListBucket",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::my-bucket"
    },
    {
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:GetObject", 
        "s3:PutObject"
      ]
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::my-bucket/my-folder/*"
    }
  ]
}

How Shift-To Relationships Work

Qumulo Core performs the following steps when it creates a Shift-To relationship.

  1. Verifies that the directory exists on the Qumulo cluster and that the specified S3 bucket exists, is accessible by using the specified credentials, and contains downloadable objects.

  2. Creates the Shift-To relationship.

  3. Starts a job by using one of the nodes in the Qumulo cluster.

  4. To ensure that the copy is point-in-time consistent, takes a temporary snapshot of the directory (for example, named replication_to_bucket_my_bucket).

  5. Recursively traverses the directories and files in the snapshots and copies each object to a corresponding object in S3.

  6. Preserves the file paths in the local directory in the keys of replicated objects.

    For example, the file /my-dir/my-project/file.text, where my-dir is the directory on your Qumulo cluster, is uploaded to S3 as the following object, where my-folder is the specified S3 folder.

    https://my-bucket.s3.us-west-2.amazonaws.com/my-folder/my-project/file.txt
    

    The following table explains how entities in the Qumulo file system map to entities in an S3 bucket.

    Entity in the Qumulo File System Entity in an Amazon S3 Bucket
    Access control list (ACL) Not copied
    Alternate data streams Not copied
    Directory Not copied (directory structure is preserved in the object key for objects created for files)
    Hard link to a non-regular file Not copied
    Hard link to a regular file Copy of the S3 object
    Holes in sparse files Zeroes (holes are expanded)
    Regular file S3 object (the object key is the file system path and the metadata is the field data)
    SMB extended file attributes Not copied
    Symbolic link Not copied
    Timestamps (mtime, ctime, atime, btime) Not copied
    UNIX device file Not copied
  7. Checks whether a file is already replicated. If the object exists in the remote S3 bucket, and neither the file nor the object are modified since the last successful replication, its data isn’t retransferred to S3.

  8. Deletes the temporary snapshot.

Storing and Reusing Relationships

The Shift-To relationship remains on the Qumulo cluster. You can monitor the completion status of a job, start new jobs for a relationship after the initial job finishes, and delete the relationship (when you no longer need the S3-folder-Qumulo-directory pair). To avoid reuploading objects that a previous copy job uploaded, relationships take up approximately 100 bytes for each object. To free this storage, you can delete relationships that you no longer need.

If you repeatedly copy from the same Qumulo directory, you can speed up the upload process (and skip already uploaded files) by using the same relationship.

A new relationship for subsequent uploads doesn’t share any tracking information with previous relationships associated with a directory and might recopy data that is already uploaded.

Using the Qumulo Core Web UI to Copy Files and Manage Relationships

This section describes how to use the Qumulo Core Web UI 3.2.5 (and higher) to copy files from a Qumulo cluster to Amazon S3, review Shift relationship details, stop a running copy job, repeat a completed copy job, and delete a relationship.

To Copy Files to Amazon S3

  1. Log in to the Qumulo Core Web UI.

  2. Click Cluster > Copy to/from S3.

  3. On the Copy to/from S3 page, click Create Copy.

  4. On the Create Copy to/from S3 page, click Local ⇨ Remote and then enter the following:

    1. The Directory Path on your cluster (/ by default)

    2. The S3 Bucket Name

    3. The Folder in your S3 bucket

    4. The Region for your S3 bucket

    5. Your AWS Region (/ by default)

    6. Your AWS Access Key ID and Secret Access Key.

  5. (Optional) For additional configuration, click Advanced S3 Server Settings.

  6. Click Create Copy.

  7. In the Create Copy to S3? dialog box, review the Shift relationship and then click Yes, Create.

    The copy job begins.

To View Configuration Details and Status of Shift Relationships

  1. Log in to the Qumulo Core Web UI.
  2. Click Cluster > Copy to/from S3.

    The Copy to/from S3 page lists all existing Shift relationships.

  3. To get more information about a specific Shift relationship, click ⋮ > View Details.

    The Copy to/from S3 Details page displays the following information:

    • Throughput: average
    • Run Time
    • Data: total, transferred, and unchanged
    • Files: total, transferred, and unchanged

To Stop a Copy Job in Progress

  1. Log in to the Qumulo Core Web UI.
  2. Click Cluster > Copy to/from S3.
  3. To stop a copy job for a specific relationship, click ⋮ > Abort.
  4. In the Abort copy from? dialog box, review the Shift relationship and then click Yes, Abort.

    The copy job stops.

To Repeat a Completed Copy Job

  1. Log in to the Qumulo Core Web UI.
  2. Click Cluster > Copy to/from S3.
  3. To stop a copy job for a specific relationship, click ⋮ > Copy Again.
  4. In the Copy again? dialog box, review the Shift relationship and then click Yes, Copy Again.

    The copy job repeats.

To Delete a Shift Relationship

  1. Log in to the Qumulo Core Web UI.
  2. Click Cluster > Copy to/from S3.
  3. To stop a copy job for a specific relationship, click ⋮ > Delete.
  4. In the Delete copy from? dialog box, review the Shift relationship and then click Yes, Delete.

    The copy job is deleted.

Using the Qumulo CLI to Copy Files and Manage Relationships

This section describes how to use the Qumulo CLI 3.2.5 (and higher) to copy files from a Qumulo cluster to Amazon S3, review Shift relationship details, stop a running copy job, repeat a completed copy job, and delete a relationship.

Copying Files from Amazon S3

To copy files, run the qq replication_create_object_relationship command and specify the following:

  • Local directory path on Qumulo cluster
  • Copy direction (copy-to)
  • S3 object folder
  • S3 bucket
  • AWS region
  • AWS access key ID
  • AWS secret access key

The following example shows how to create a relationship between the directory /my-dir/ on a Qumulo cluster and the S3 bucket my-bucket and folder /my-folder/ in the us-west-2 AWS region. The secret access key is associated with the access key ID.

qq replication_create_object_relationship \
  --source-directory-path /my-dir/ \
  --direction COPY_TO_OBJECT \
  --object-folder /my-folder/ \
  --bucket my-bucket \
  --region us-west-2 \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

The CLI returns the details of the relationship in JSON format, for example:

{
  "access_key_id": "ABC",
  "bucket": "my-bucket",
  "object_store_address": "s3.us-west-2.amazonaws.com",
  "id": "1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78",
  "object_folder": "my-folder/",
  "port": 443,
  "ca_certificate": null,
  "region": "us-west-2",
  "source_directory_id": "3",
  "direction": "COPY_TO_OBJECT",
}

Viewing Configuration Details and Status of Shift Relationships

  • To view configuration details for all Shift relationships, run the qq replication_list_object_relationships command.

  • To view configuration details for a specific relationship, run the qq replication_get_object_relationship command followed by the --id and the Shift relationship ID (GUID), for example:

     qq replication_get_object_relationship --id 1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78
    
  • To view the status of a specific relationship, run the qq replication_get_object_relationship_status command followed by the --id and the Shift relationship ID.

  • To view the status of all relationships, run the qq replication_list_object_relationship_statuses command.

    The CLI returns the details of all relationships in JSON format, for example:

    [
      {
        "direction": "COPY_TO_OBJECT",
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "bucket": "my-bucket",
        "object_store_address": "s3.us-west-2.amazonaws.com",
        "id": "1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78",
        "object_folder": "my-folder/",
        "port": 443,
        "ca_certificate": null,
        "region": "us-west-2",
        "source_directory_id": "3",
        "source_directory_path": "/my-dir/",
        "state": "REPLICATION_RUNNING",
        "current_job": {
          "start_time": "2020-04-06T17:56:29.659309904Z",
          "estimated_end_time": "2020-04-06T21:54:33.244095593Z",
          "job_progress": {
            "bytes_transferred": "178388608",
            "bytes_unchanged": "0",
            "bytes_remaining": "21660032",
            "bytes_total": "200048640",
            "files_transferred": "17",
            "files_unchanged": "0",
            "files_remaining": "4",
            "files_total": "21",
            "percent_complete": 89.0368314738253,
            "throughput_current": "12330689",
            "throughput_overall": "12330689"
          }
        },
        "last_job": null
      }
    ]
    

    The state field shows the REPLICATION_RUNNING status and the current_job field shows the job’s progress. When Qumulo Core copies files from S3, details for the most recently completed job become available in the last_job field, the state field changes to REPLICATION_NOT_RUNNING, and the current_job field reverts to null.

    The bytes_total and files_total fields represent the total amount of data and number of files to be transferred by a Shift job. The bytes_remaining and files_remaining fields show the amount of data and number of files not yet transferred. The values of these four fields don’t stabilize until the work estimation for the job is complete.

    The percent_complete field displays the overall job progress and the estimated_end_time field displays the time at which the job is estimated to be complete. The values of these two fields are populated when the work estimation for the job is complete.

Stopping a Copy Job in Progress

To stop a copy job already in progress, run the qq replication_abort_object_replication command and use the --id flag to specify the Shift relationship ID.

Repeating a Completed Copy Job

To repeat a completed copy job, run the qq replication_start_object_relationship command and use the --id flag to specify the Shift relationship ID.

This command begins a new job for the existing relationship and downloads any content that changed in the S3 bucket or on the Qumulo cluster since the time the previous job ran.

Deleting a Shift Relationship

After your copy job is complete, you can delete your Shift relationship. To do this, run the replication_delete_object_relationship command and use the --id flag to specify the Shift relationship ID.

This command removes the copy job’s record, leaving locally stored objects unchanged. Any storage that the relationship used to track downloaded objects becomes available when you delete the relationship.

Troubleshooting Copy Job Issues

Any fatal errors that occur during a copy job cause the job to fail, leaving a partially copied set of files in the directory in your S3 bucket. However, to let you review the Shift relationship status any failure messages, the Shift relationship continues to exist. You can start a new job to complete the copying of objects to the S3 bucket—any successfully transferred files from the previous job aren’t retransferred from your Qumulo cluster.

Whenever Qumulo Core doesn’t complete an operation successfully and returns an error from the API or CLI, the error field within the last_job field (that the replication_list_object_relationship_statuses command returns) contains a detailed failure message. For more troubleshooting details, see qumulo-replication.log on your Qumulo cluster.

Best Practices

We recommend the following best practices for working with Qumulo Shift-To for Amazon S3.

  • Bucket Lifecycle Policy: To abort any incomplete uploads older than several days and ensure the automatic clean-up of any storage that incomplete parts of large objects (left by failed or interrupted replication operations) use, configure a bucket lifecycle policy. For more information, see Uploading and copying objects using multipart upload in the Amazon Simple Storage Service User Guide.
  • VPC Endpoints: For best performance when using a Qumulo cluster in AWS, configure a VPC endpoint to S3. For on-premises Qumulo clusters, we recommend AWS Direct Connect or another high-bandwidth, low-latency connection to S3.
  • Unique Artifacts: To avoid collisions between different data sets, specify a unique object folder or unique bucket for each replication relationship from a Qumulo cluster to S3.
  • Object Versioning: To protect against unintended overwrites, enable object versioning. For more information, see Using versioning in S3 buckets in the Amazon Simple Storage Service User Guide.
  • Completed Jobs: If you don’t plan to use a Shift relationship to download updates from S3, delete the relationship to free up any storage associated with it.
  • Concurrent Replication Relationships: To increase parallelism, especially across distinct datasets, use concurrent replication relationships to S3. To avoid having a large number of concurrent operations impact client I/O to the Qumulo cluster, limit the number of concurrent replication relationships. While there is no hard limit, we don’t recommend creating more than 100 concurrent replication relationships on a cluster (including both Shift and Qumulo local replication relationships).

Restrictions

  • Object-Locked Buckets: You can’t use buckets configured with S3 Object Lock and a default retention period for Shift-To. If possible, either remove the default retention period and set retention periods explicitly on objects uploaded outside of Shift or use a different S3 bucket without S3 Object Lock enabled. For more information, see How S3 Object Lock works in the Amazon Simple Storage Service User Guide.
  • File Size Limit: The size of an individual file can’t exceed 5 TiB (this is the maximum object size that S3 supports). There is no limit on the total size of all your files.
  • File Path Limit: The length of a file path must be shorter than 1,024 characters, including the configured object folder prefix, excluding the local directory path.
  • Hard Links: Qumulo Core 3.2.3 (and higher) supports hard links, up to the maximum object size that S3 supports.
  • Objects Under the Same Key: Unless an object contains Qumulo-specific hash metadata that matches a file, any object that exists under the same key that a new relationship replicates is overwritten. To retain older versions of overwritten objects, enable versioning for your S3 bucket. For more information, see Using versioning in S3 buckets in the Amazon Simple Storage Service User Guide.
  • Object Checksums: All files replicated by using S3 server-side integrity verification (during upload) use a SHA256 checksum stored in the replicated object’s metadata.
  • S3-Compatible Object Stores: S3-compatible object stores aren’t supported. Currently, Qumulo Shift-To supports replication only to Amazon S3.
  • HTTP: HTTP isn’t supported. All Qumulo connections are encrypted by using HTTPS and verify the S3 server’s SSL certificate.
  • Anonymous Access: Anonymous access isn’t supported. You must use valid AWS credentials.
  • Replication without Throttling: Replication provides no throttling and might use all available bandwidth. If necessary, use Quality of Service rules on your network.
  • Amazon S3 Standard Storage Class: Qumulo Shift-To supports uploading only objects stored in the Amazon S3 Standard storage class. You can’t download objects stored in the Amazon S3 Glacier or Deep Archive storage classes and any buckets that contain such objects cause a copy job to fail.
  • Content-Type Metadata: Because all objects are stored in S3 using the default binary/octet-stream content type, they might be interpreted as binary data if you download them by using a browser. To attach content-type metadata to your objects, use the AWS Console.