This section explains how to use Shift-From to copy objects from a folder in an Amazon Simple Storage Service (Amazon S3) bucket (cloud object store) to a directory in a Qumulo cluster and how to manage Shift relationships.

For more information about copying objects from Qumulo to S3, see Using Qumulo Shift-To for Amazon S3 to Copy Objects on Qumulo Care.

Prerequisites

  • A Qumulo cluster with:

  • Membership in a Qumulo role with the following privileges:

    • PRIVILEGE_REPLICATION_OBJECT_WRITE: This privilege is required to create a Shift relationship.

    • PRIVILEGE_REPLICATION_OBJECT_READ: This privilege is required to view the status of a Shift relationship.

  • An existing bucket with contents in Amazon S3

  • AWS credentials (access key ID and secret access key) with the following permissions:

    • s3:GetObject

    • s3:ListBucket

    For more information, see Understanding and getting your AWS credentials in the AWS General Reference

Example IAM Policy

In the following example, the IAM policy gives permission to read from and write to the my-folder folder in the my-bucket. This policy can give users the minimal set of permissions required to run Shift-From jobs. (Shift-To jobs require a less-restrictive policy. For more information and an example, see Using Qumulo Shift-To for Amazon S3 to Copy Objects.)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "s3:ListBucket",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::my-bucket"
    },
    {
      "Action": [
        "s3:GetObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::my-bucket/my-folder/*"
    }
  ]
}

How Shift-From Relationships Work

Qumulo Core performs the following steps when it creates a Shift-From relationship.

  1. Verifies that the directory exists on the Qumulo cluster and that the specified S3 bucket exists, is accessible by using the specified credentials, and contains downloadable objects.

  2. Creates the Shift-From relationship.

  3. Starts a job by using one of the nodes in the Qumulo cluster.

  4. Lists the contents of the S3 folder and downloads the objects to the specified directory on your Qumulo cluster.

  5. Forms the full path of the file on the Qumulo cluster by appending the path of the object (relative to the S3 folder) to the directory path on the Qumulo cluster.

    For example, the following object is downloaded to /my-dir/my-project/file.text, where my-folder is the specified S3 folder and my-dir is the directory on your Qumulo cluster.

    https://my-bucket.s3.us-west-2.amazonaws.com/my-folder/my-project/file.txt
    
  6. Avoids redownloading an unchanged object in a subsequent job by tracking the information about an object and its replicated object.

Storing and Reusing Relationships

The Shift-From relationship remains on the Qumulo cluster. You can monitor the completion status of a job, start new jobs for a relationship after the initial job finishes, and delete the relationship (when you no longer need the S3-folder-Qumulo-directory pair). To avoid redownloading objects that a previous copy job downloaded, relationships take up approximately 100 bytes for each object. To free this storage, you can delete relationships that you no longer need.

If you repeatedly download from the same S3 folder, you can speed up the download process (and skip already downloaded files) by using the same relationship.

A new relationship for subsequent downloads doesn’t share any tracking information with previous relationships associated with a directory and might recopy data that is already downloaded.

Using the Qumulo Core Web UI to Copy Files and Manage Relationships

This section describes how to use the Qumulo Core Web UI 4.2.5 (and higher) to copy files from Amazon S3 to a Qumulo cluster, review Shift relationship details, stop a running copy job, repeat a completed copy job, and delete a relationship.

To Copy Files from Amazon S3

  1. Log in to the Qumulo Core Web UI.

  2. Click Cluster > Copy to/from S3.

  3. On the Copy to/from S3 page, click Create Copy.

  4. On the Create Copy to/from S3 page, click Local ⇦ Remote and then enter the following:

    1. The Directory Path on your cluster (/ by default)

    2. The S3 Bucket Name

    3. The Folder in your S3 bucket

    4. The Region for your S3 bucket

    5. Your AWS Region (/ by default)

    6. Your AWS Access Key ID and Secret Access Key.

  5. (Optional) For additional configuration, click Advanced S3 Server Settings.

  6. Click Create Copy.

  7. In the Create Copy from S3? dialog box, review the Shift relationship and then click Yes, Create.

    The copy job begins and Qumulo Core estimates the work to be performed. When the estimation is complete, the Qumulo Core Web UI displays a progress bar with a percentage for a relationship on the Replication Relationships page. The page also displays the estimated total work, the remaining bytes and files, and the estimated time to completion for a running copy job.

To View Configuration Details and Status of Shift Relationships

  1. Log in to the Qumulo Core Web UI.
  2. Click Cluster > Copy to/from S3.

    The Copy to/from S3 page lists all existing Shift relationships.

  3. To get more information about a specific Shift relationship, click ⋮ > View Details.

    The Copy to/from S3 Details page displays the following information:

    • Throughput: average
    • Run Time
    • Data: total, transferred, and unchanged
    • Files: total, transferred, and unchanged

To Stop a Copy Job in Progress

  1. Log in to the Qumulo Core Web UI.
  2. Click Cluster > Copy to/from S3.
  3. To stop a copy job for a specific relationship, click ⋮ > Abort.
  4. In the Abort copy from? dialog box, review the Shift relationship and then click Yes, Abort.

    The copy job stops.

To Repeat a Completed Copy Job

  1. Log in to the Qumulo Core Web UI.
  2. Click Cluster > Copy to/from S3.
  3. To stop a copy job for a specific relationship, click ⋮ > Copy Again.
  4. In the Copy again? dialog box, review the Shift relationship and then click Yes, Copy Again.

    The copy job repeats.

To Delete a Shift Relationship

  1. Log in to the Qumulo Core Web UI.
  2. Click Cluster > Copy to/from S3.
  3. To stop a copy job for a specific relationship, click ⋮ > Delete.
  4. In the Delete copy from? dialog box, review the Shift relationship and then click Yes, Delete.

    The copy job is deleted.

Using the Qumulo CLI to Copy Files and Manage Relationships

This section describes how to use the Qumulo CLI to copy files from Amazon S3 to a Qumulo cluster, review Shift relationship details, stop a running copy job, repeat a completed copy job, and delete a relationship.

Copying Files to Amazon S3

To copy files, run the qq replication_create_object_relationship command and specify the following:

  • Local directory path on Qumulo cluster
  • Copy direction (copy-from)
  • S3 object folder
  • S3 bucket
  • AWS region
  • AWS access key ID
  • AWS secret access key

The following example shows how to create a relationship between the directory /my-dir/ on a Qumulo cluster and the S3 bucket my-bucket and folder /my-folder/ in the us-west-2 AWS region. The secret access key is associated with the access key ID.

qq replication_create_object_relationship \
  --local-directory-path /my-dir/ \
  --direction COPY_FROM_OBJECT \
  --object-folder /my-folder/ \
  --bucket my-bucket \
  --region us-west-2 \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

The CLI returns the details of the relationship in JSON format, for example:

{
  "access_key_id": "ABC",
  "bucket": "my-bucket",
  "object_store_address": "s3.us-west-2.amazonaws.com",
  "id": "1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78",
  "object_folder": "my-folder/",
  "port": 443,
  "ca_certificate": null,
  "region": "us-west-2",
  "local_directory_id": "3",
  "direction": "COPY_FROM_OBJECT",
}

Viewing Configuration Details and Status of Shift Relationships

  • To view configuration details for all Shift relationships, run the qq replication_list_object_relationships command.

  • To view configuration details for a specific relationship, run the qq replication_get_object_relationship command followed by the --id and the Shift relationship ID (GUID), for example:

     qq replication_get_object_relationship --id 1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78
    
  • To view the status of a specific relationship, run the qq replication_get_object_relationship_status command followed by the --id and the Shift relationship ID.

  • To view the status of all relationships, run the qq replication_list_object_relationship_statuses command.

    The CLI returns the details of all relationships in JSON format, for example:

    [
      {
        "direction": "COPY_FROM_OBJECT",
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "bucket": "my-bucket",
        "object_store_address": "s3.us-west-2.amazonaws.com",
        "id": "1c23b4ed-5c67-8f90-1e23-a4f5f6ceff78",
        "object_folder": "my-folder/",
        "port": 443,
        "ca_certificate": null,
        "region": "us-west-2",
        "local_directory_id": "3",
        "local_directory_path": "/my-dir/",
        "state": "REPLICATION_RUNNING",
        "current_job": {
          "start_time": "2020-04-06T17:56:29.659309904Z",
          "estimated_end_time": "2020-04-06T21:54:33.244095593Z",
          "job_progress": {
            "bytes_transferred": "178388608",
            "bytes_unchanged": "0",
            "bytes_remaining": "21660032",
            "bytes_total": "200048640",
            "files_transferred": "17",
            "files_unchanged": "0",
            "files_remaining": "4",
            "files_total": "21",
            "percent_complete": 89.0368314738253,
            "throughput_current": "12330689",
            "throughput_overall": "12330689"
          }
        },
        "last_job": null
      }
    ]
    

    The state field shows the REPLICATION_RUNNING status and the current_job field shows the job’s progress. When Qumulo Core copies files from S3, details for the most recently completed job become available in the last_job field, the state field changes to REPLICATION_NOT_RUNNING, and the current_job field reverts to null.

    The bytes_total and files_total fields represent the total amount of data and number of files to be transferred by a Shift job. The bytes_remaining and files_remaining fields show the amount of data and number of files not yet transferred. The values of these four fields don’t stabilize until the work estimation for the job is complete.

    The percent_complete field displays the overall job progress and the estimated_end_time field displays the time at which the job is estimated to be complete. The values of these two fields are populated when the work estimation for the job is complete.

    Shift-From performs a single task that estimates the amount of content to copy by listing all files and summing up their contents. Until this task is complete, the percent_complete field is set to "None" and the estimated_end_time field is set to "". To list the bucket prefix content in sets of 5,000 objects, this task uses the ListObjectV2 S3 action.

Stopping a Copy Job in Progress

To stop a copy job already in progress, run the qq replication_abort_object_replication command and use the --id flag to specify the Shift relationship ID.

Repeating a Completed Copy Job

To repeat a completed copy job, run the qq replication_start_object_relationship command and use the --id flag to specify the Shift relationship ID.

This command begins a new job for the existing relationship and downloads any content that changed in the S3 bucket or on the Qumulo cluster since the time the previous job ran.

Deleting a Shift Relationship

After your copy job is complete, you can delete your Shift relationship. To do this, run the replication_delete_object_relationship command and use the --id flag to specify the Shift relationship ID.

This command removes the copy job’s record, leaving locally stored objects unchanged. Any storage that the relationship used to track downloaded objects becomes available when you delete the relationship.

Troubleshooting Copy Job Issues

Any fatal errors that occur during a copy job cause the job to fail, leaving a partially copied set of files in the directory on your Qumulo cluster. However, to let you review the Shift relationship status any failure messages, the Shift relationship continues to exist. You can start a new job to complete the copying of objects from the S3 bucket—any successfully transferred files from the previous job aren’t retransferred to your Qumulo cluster.

Whenever Qumulo Core doesn’t complete an operation successfully and returns an error from the API or CLI, the error field within the last_job field (that the replication_list_object_relationship_statuses command returns) contains a detailed failure message. For more troubleshooting details, see qumulo-replication.log on your Qumulo cluster.

Best Practices

We recommend the following best practices for working with Qumulo Shift-From for Amazon S3.

  • Inheritable Permissions: Because the system user creates the files that Shift-From for S3 copies, the system owns these files. By default, everyone is granted read permissions and administrators always have full access to the files.

  • VPC Endpoints: For best performance when using a Qumulo cluster in AWS, configure a VPC endpoint to S3. For on-premises Qumulo clusters, we recommend AWS Direct Connect or another high-bandwidth, low-latency connection to S3.
  • Repeated Synchronization: If you need to repeatedly synchronize an S3 folder with a Qumulo directory, we recommend reusing the same relationship. This lets you avoid repeated downloading of unchanged objects that already exist locally.
  • Completed Jobs: If you don’t plan to use a Shift relationship to download updates from S3, delete the relationship to free up any storage associated with it.
  • Concurrent Replication Relationships: To increase parallelism, especially across distinct datasets, use concurrent replication relationships from S3. To avoid having a large number of concurrent operations impact client I/O to the Qumulo cluster, limit the number of concurrent replication relationships. While there is no hard limit, we don’t recommend creating more than 100 concurrent replication relationships on a cluster (including both Shift and Qumulo local replication relationships).

Restrictions

  • S3-Compatible Object Stores: S3-compatible object stores aren’t supported. Currently, Qumulo Shift-From supports replication only from Amazon S3.
  • HTTP: HTTP isn’t supported. All Qumulo connections are encrypted by using HTTPS and verify the S3 server’s SSL certificate.
  • Anonymous Access: Anonymous access isn’t supported. You must use valid AWS credentials.
  • Replication without Throttling: Replication provides no throttling and might use all available bandwidth. If necessary, use Quality of Service rules on your network.
  • Amazon S3 Standard Storage Class: Qumulo Shift-From supports downloading only objects stored in the Amazon S3 Standard storage class. You can’t download objects stored in the Amazon S3 Glacier or Deep Archive storage classes and any buckets that contain such objects cause a copy job to fail.
  • Disallowed Amazon S3 Paths in Qumulo Clusters: Certain allowed Amazon S3 paths can’t be copied to Qumulo clusters and cause a copy job to fail. Disallowed paths contain:
    • A trailing slash (/) character (with non-zero object content length)
    • Consecutive slash (/) characters
    • Single and double period (., ..) characters
    • The path component .snapshot
  • Disallowed Conflicting Types: When content in an S3 bucket or Qumulo directory changes over time, a conflict related to type mismatches might arise, the Shift-from job fails, and an error message gives details about the conflict. For example, a conflict might occur when a remote object maps to a local file system directory entry which:
    • Is a regular file with two or more links
    • Isn’t a regular file (for example, a directory or a special file)
  • Disallowed Amazon S3 Path Configurations: Because of conflicting type requirements, Qumulo Core can’t recreate certain allowed Amazon S3 path configurations on Qumulo clusters. For example, if an S3 bucket contains objects a/b/c and a/b, then path a/b must be both a file and directory on a Qumulo cluster. Because this isn’t possible, this configuration causes a copy job to fail.
  • Directories in Multiple Relationships: A directory on a Qumulo cluster for one Shift relationship can’t overlap with a directory used for another Shift relationship, or with a remote directory for a Qumulo-to-Qumulo replication relationship. This causes the relationship creation to fail.
  • Changes to S3 Folder During Copy Job: Currently, Shift-From assumes that the S3 folder remains unchanged throughout the copy job. Any changes (deleting, archiving, or modifying an object) during the copy job might cause a copy job to fail.
  • Read-Only Local Directory: When the Shift-From copy job begins, the local directory on the Qumulo cluster becomes read-only. While no external clients can modify anything in the directory or its subdirectories, all content remains readable. When the copy job is complete, the directory reverts to its previous permissions.
  • Partially Downloaded Files: If a copy job is interrupted or encounters a fatal error (that can’t be resolved by retrying the operation), Qumulo Core attempts to delete partially downloaded files. Because this is a best-effort process, certain interruptions can prevent the cleanup of partially downloaded files.