Managing Multipart S3 Uploads in Qumulo Core

This section explains how multipart S3 uploads affect usable capacity on a Qumulo cluster and how to abort and clean up multipart uploads manually or automatically.

Qumulo Core supports the multipart upload functionality of the S3 API, which lets you upload objects to a bucket in parts and then, at a later time, combine these parts into a single object.

Note
For objects above a certain size (typically, larger than 100 MiB), applications often use the multipart S3 uploads, rather than the PutObject S3 API action. The limitation for the PutObject action is 5 GiB. For more information about how Qumulo handles this type of operation, see System-Initiated Multipart S3 Uploads.

Prerequisites

To manage multipart S3 uploads by using the qq CLI, you need the following role-based access control (RBAC) privileges:

PRIVILEGE_S3_SETTINGS_WRITE: Configure frequency of multipart upload cleanup
PRIVILEGE_S3_UPLOADS_READ: List multipart uploads
PRIVILEGE_S3_UPLOADS_WRITE: Abort multipart uploads

How Multipart S3 Uploads Affect Usable Capacity on a Qumulo Cluster

The following conditions are true for multipart S3 uploads in Qumulo Core.

To let you resume large uploads in the event of an outage, Qumulo Core stores data on the cluster durably.
Multipart upload data isn’t visible in the Qumulo file system, and isn’t included in file system snapshots, until you complete the upload successfully by making a call to the CompleteMultipartUpload S3 API.

Note
When you view the breakdown of a Qumulo cluster’s capacity by using the Qumulo Core Web UI, REST API, or qq CLI, Qumulo Core doesn’t distinguish between capacity that the file system and incomplete multipart uploads use.
Qumulo Core doesn’t delete multipart data unless it aborts and cleans up the multipart upload automatically or you abort and clean up the multipart upload manually.

To check how much space incomplete multipart uploads use on your cluster, you can list the uploads by using the Qumulo REST API or qq CLI. For more information, see Listing Multipart Uploads.

How System-Initiated Multipart S3 Uploads Work

Occasionally, when you list your multipart uploads, you might see uploads that you didn’t initiate. These are system-initiated uploads which Qumulo Core uses for PutObject and CopyObject S3 API actions for objects that exceed a certain size.

If Qumulo Core encounters an error while performing a system-initiated upload, it attempts to abort the upload and clean up the partial upload data immediately.

However, if Qumulo Core is unable to clean up the incomplete upload data immediately, it cleans up the incomplete upload data in the background, according to the expiry interval.

Note
The process for background clean-up after incomplete and user-initiated uploads is the same. For more information, see Aborting and Cleaning Up Multipart S3 Uploads Automatically.

Listing Incomplete Multipart S3 Uploads

You can list the incomplete multipart uploads for a single S3 bucket by using the Qumulo REST API or qq CLI.

Note

If you use the ListMultipartUploads S3 API action, the system doesn't show system-initiated uploads or how much space the uploads use on your cluster.
If you use the Qumulo REST API or qq CLI, Qumulo Core shows system-initiated uploads and how much space each upload uses on your cluster.

To list incomplete uploads by using the qq CLI, run the qq s3_list_uploads command and specify the bucket name. For example:
```
$ qq s3_list_uploads \
  --bucket my-bucket
```
To list incomplete uploads by using the Qumulo REST API, send a GET request to the /v1/s3/buckets/<bucket-name>/uploads/ endpoint and specify the bucket name.

The output from the qq CLI and REST API is the same. The following example output is a single JSON object that contains the list of objects for the specified bucket. The list shows information for each multipart S3 upload, including:

When each upload was initiated
Which identity initiated the upload
When the upload received data last
How much space the upload uses on the cluster—by data, by metadata, and in total—in units of blocks (4,096 bytes per block)

{
  "uploads": [
    {
      "bucket": "my-bucket",
      "completing": false,
      "datablocks": "16384",
      "id": "00000000example1",
      "initiated": "2023-03-02T19:01:00.446468848Z",
      "initiator": {
        "auth_id": "500",
        "domain": null,
        "gid": null,
        "name": null,
        "sid": null,
        "uid": null
      },
      "key": "deployment/data1.dat",
      "last_modified": "2023-03-02T19:03:37.209271702Z",
      "metablocks": "3",
      "system_initiated": false,
      "total_blocks": "16387"
    },
    {
      "bucket": "my-bucket",
      "completing": false,
      "datablocks": "24576",
      "id": "00000000example2",
      "initiated": "2023-03-02T19:09:04.530619255Z",
      "initiator": {
        "auth_id": "500",
        "domain": null,
        "gid": null,
        "name": null,
        "sid": null,
        "uid": null
      },
      "key": "release.dat",
      "last_modified": "2023-03-02T19:09:06.436699236Z",
      "metablocks": "4",
      "system_initiated": true,
      "total_blocks": "24580"
    }
  ]
}

Aborting and Cleaning Up Multipart S3 Uploads Automatically

Qumulo Core automatically aborts and cleans up an incomplete multipart S3 if the upload doesn’t receive any data after the configured expiry interval (1 day by default).

When Qumulo Core removes a multipart upload, it frees up the space that the upload uses on the cluster. You can configure the expiry interval by using the Qumulo REST API or qq CLI.

To configure the expiry interval for all current and future multipart uploads by using the qq CLI, run the qq s3_modify_settings command and the --multipart-upload-expiry-interval flag and specify one of the following:

The string never.
A string in the format <quantity><units> (without a space), where <quantity> is a positive integer less than 100 and <units> is one of the following strings:
- days
- hours
- minutes
- months
- weeks

In the following example, we instruct Qumulo Core to abort and clean up uploads that haven’t received data in more than 30 days.

$ qq s3_modify_settings \
  --multipart-upload-expiry-interval 30days

In the following example, we disable automatic cleanup.

$ qq s3_modify_settings \
  --multipart-upload-expiry-interval never

Aborting or Cleaning Up Multipart S3 Uploads Manually

Use the Qumulo REST API or qq CLI to abort and clean up the upload. You need the bucket name and upload ID. For more information about looking up this information, see Listing Incomplete Multipart S3 Uploads.

Note
If you are an administrative user or the user who initiated the upload, you can use the AbortMultipartUpload S3 API action. In addition to the bucket name and upload ID, you also need the object key for the upload.

To abort an upload by using the qq CLI, run the qq s3_abort_upload command and specify the upload ID. For example:
```
$ qq s3_abort_upload \
  --bucket my-bucket \
  --upload-id 000000000example
```
To abort an upload by using the Qumulo REST API, send a DELETE request to the /v1/s3/buckets/<bucket-name>/uploads/<upload-ID> endpoint and specify the upload ID. For example:
```
DELETE /v1/s3/buckets/my-bucket/uploads/000000000example
```

There is no response body for both the qq CLI and REST API. Qumulo Core returns a 204 No Content status code when the upload is aborted or the cleanup is complete.