This section explains how creating portals on Qumulo clusters, and establishing relationships between spoke and hub portals, enables Cloud Data Fabric functionality in Qumulo Core.

Qumulo clusters can take advantage of the Cloud Data Fabric functionality that lets clusters across disparate geographic or infrastructural locations (on-premises and in the cloud) access the same data while maintaining independent namespace structures on each cluster (for example, by setting only a portion of the cluster’s file system as the portal root directory).

To enable Cloud Data Fabric functionality, you must define a spoke portal on one cluster, a hub portal on another cluster, and then propose a portal relationship between the two.

Key Terms

The following key terms help define the components of Cloud Data Fabric functionality in Qumulo Core.

Clusters and Root Directories

  • Cluster: Any Qumulo cluster that shares a portion of its file system for a hub portal or a spoke portal. A directory on a cluster defines the root directory for a spoke portal or a hub portal.

  • Spoke Portal Root Directory, Hub Portal Root Directory: A directory on a cluster that uses a portion of its file system for the hub portal or spoke portal.

    According to the file system permissions that a hub portal might impose, you can access a spoke portal root directory by using NFSv3, SMB, or the Qumulo REST API. Qumulo Core 7.4.3 (and higher) supports NFSv4.1.

Portals

  • Spoke Portal: An interface point on a Qumulo cluster that accesses a portion of the file system on another cluster (which has a hub portal). A directory on a cluster defines the root directory for spoke portal. The spoke portal initiates the creation of a hub portal.

    • Read-Write Portal: A spoke portal that can access, modify, and create any files or directories within the hub portal root directory according to the file system permissions.

    • Read-Only Portal: A spoke portal that can access any files or directories within the hub portal root directory according to the file system permissions, but can’t modify or create any files or directories regardless of file system permissions.

  • Hub Portal: An interface point on a Qumulo cluster that shares a portion of its file system with another cluster (which has a spoke portal). A directory on a cluster defines the root directory for hub portal. The spoke portal initiates the creation of a hub portal. You can configure multiple portal relationships, with the same hub portal root directory, with nested directories, or with independent ones.

  • Portal Relationship: A proposal that a spoke portal on one Qumulo cluster issues to another Qumulo cluster (with a hub portal), which the Qumulo cluster with the hub portal authorizes.

Portal States

A portal state indicates the stages of the spoke portal creation process, and the proposal or deletion of a portal relationship.

State Description

Unlinked

Qumulo Core created the spoke portal, but couldn't establish a relationship for it or clean up the spoke portal automatically.

Before trying to re-establish the portal relationship, use the qq portal_delete_spoke command to clean up the spoke portal manually.

Pending

Qumulo Core established a relationship between the spoke portal and a hub portal, but the hub portal has not yet given its authorization.

Use the qq portal_authorize_hub command to give the authorization.

Authorized

The portal relationship is approved by both clusters and the spoke portal root directory is accessible, if full connectivity is established.

Deleting

Qumulo Core is in process of synchronizing any outstanding changes from the spoke portal to the hub portal. When synchronization is complete, Qumulo Core removes the portal relationship from each cluster.

Portal Statuses

A portal status indicates the accessibility of a spoke portal or hub portal.

Status Description

Inactive

The portal relationship is in process of being configured.

  • Full connectivity isn't required at this time.
  • The portal is inaccessible.

Active

All required connections between the spoke portal and hub portal are established.

  • The portal requires full connectivity.
  • The portal is fully accessible.

Degraded

Some or all required connections between the spoke portal and hub portal are missing.

  • Qumulo Core is attempting to restore connectivity.
  • The portal might be inaccessible.

How Cloud Data Fabric Functionality Works

This section explains the creation of portal relationships, data caching and synchronization, permissions in portal root directories, and the deletion of portal relationships.

Creation of Portal Relationships

When the hub portal authorizes the portal relationship, the contents of the hub portal root directory become available to the spoke portal immediately.

Data Synchronization

For read-write portals, data synchronization is bidirectional, asynchronous, and strictly consistent upon access. For example, when a client creates or modifies files or directories in the spoke portal root directory, the spoke portal synchronizes these changes to the hub portal in the background. Clients that access the hub portal can see these changes immediately.

To ensure that any changes on one portal become available immediately to any client that reads data from the portal’s peers, Qumulo Core uses a proprietary locking synchronization mechanism.

Data Caching

The first time a client accesses a spoke portal root directory, the spoke portal begins to read and cache data from the hub portal. Subsequent access to the same data accesses the cache of the spoke portal host cluster, with performance characteristics equivalent to access to non-portal data on the spoke portal host cluster.

Caching takes place on demand, when a client with access to the spoke portal retrieves new portions of the namespace that the hub portal provides. For more information, see Configuring Cache Management for Spoke Portals in Qumulo Core.

Permissions in Portal Root Directories

Qumulo Core enforces permissions in the same way for files and directories in the spoke portal root directory and the hub portal root directory.

Deletion of Portal Relationships

This section explains the sequence of events when you request the removal of the portal relationship from the spoke portal or the hub portal.

  1. When you request the removal of the spoke portal, the relationship becomes read-only and enters the Deleting state and Qumulo Core begins to synchronize any outstanding changes from the spoke portal to the hub portal.

  2. During deletion, the relationship requires connectivity to make progress, indicated by the Active status.

  3. After deletion completes, Qumulo Core:

    1. Removes the spoke portal and hub portal configuration entries automatically

    2. Deletes the spoke portal root directory and reclaims the capacity previously consumed by cached data.

Example Cloud Data Fabric Scenarios

The following are examples of some of the most common scenarios for workloads that use Cloud Data Fabric functionality.

Edge Clusters

In this scenario, you deploy a single, large central cluster at your organization’s data center and multiple, small edge clusters at your organization’s branch offices or in remote locations.

A diagram for an example scenario that uses the Cloud Data Fabric functionality for an edge cluster

The Cloud Data Fabric functionality lets you make the data on the central cluster available to the remote clusters without the need to replicate data to each location. The data remains available to the edge clusters even if their capacity is lower than that of the central cluster. While a read-write portal lets the edge clusters create or modify data on the central cluster, a read-only portal lets only the edge clusters read data from the central cluster.

Active Workload with Archive

In this scenario, several clusters serve active workloads but require access to a large data archive after the initial workflow completes.

Active Workload with Archive

The Cloud Data Fabric functionality lets you:

  • Move your cold (infrequently accessed) data to a central archive cluster and then provide access to this data by using a portal on the original cluster.

    The active workload clusters can reclaim most of the data set capacity that was tiered to the data archive cluster. This makes it possible to access all of the data as before, while using only the capacity on the active workload clusters for the data that your system reads through the portal.

  • Serve specific archive capacity and performance needs by scaling the archive cluster independently of any active workflow clusters.

Known Limitations of the Cloud Data Fabric Functionality in Qumulo Core

General

  • Currently, it is possible to configure and manage Cloud Data Fabric functionality only by using the qq CLI.

File System

  • A Qumulo cluster can be a portal host for any number of hub portals or for a single spoke portal. It isn't possible for a Qumulo cluster to be a host for spoke and hub portals simultaneously.
  • While Qumulo Core doesn’t support hard links between the files local to the spoke portal host cluster and files within the spoke portal root directory, it does support hard links entirely outside or inside the spoke portal root directory.

Data Caching

  • Although first-time data access to data in a portal root directory is subject to round-trip latency between the spoke portal host cluster and the hub portal host cluster, subsequent access to the data is faster. Making changes to data under a portal root directory is also subject to latency when the system recaches these changes upon access.
  • The cache of a spoke portal is inherently ephemeral. You must not use it in place of data replication or backup.

Portal Connectivity

  • For a spoke portal to be accessible, there must be full connectivity between the two clusters in a portal relationship, without which files or directories with outstanding modifications on one portal are inaccessible on other portals.
  • A spoke portal is inaccessible if the hub portal host cluster and the spoke portal host cluster run different versions of Qumulo Core.

Protocols

S3

  • It isn't possible to create a spoke portal on a cluster with the S3 protocol enabled or to enable this protocol on an existing spoke portal host cluster.

NFS

  • While NFSv3 is a stateless protocol, NFSv4.1 is a stateful protocol which permits open file handles to remain open after a file is unlinked. However, Qumulo Core doesn’t always maintain access to files deleted from a portal in a relationship. For example, if you open a file on the spoke portal host cluster and then delete the same file on the hub portal host cluster, an application that uses the file on the spoke portal host cluster will lose access to the file unexpectedly.
  • When you authenticate over NFSv4.1 by using Kerberos, you can use Kerberos principals only from the Active Directory domain associated with the Qumulo cluster to which you are connected. It isn’t possible to use principals from a remote Qumulo cluster.”
  • When you edit ACLs over NFSv4.1 by using editfacl or similar tools, you can use only Kerberos principals from the Active Directory domain associated with the Qumulo cluster to which you are connected. It isn’t possible to use principals from a remote Qumulo cluster.
  • Protocol locks don't synchronize between the hub portal host cluster and the spoke portal host cluster. Specifically, NFSv3 or NLM byte-range locks, NFSv4.1 locking operations, SMB share-mode locks, SMB byte-range locks, and SMB leases function independently on the two clusters. For example, while two exclusive locks on the same spoke portal host cluster contend with each each other, an exclusive lock on a spoke portal host cluster doesn’t contend with an exclusive lock on the hub portal host cluster.