How Portal Relationships between Qumulo Clusters Enable Cloud Data Fabric Functionality in Qumulo Core

This section explains how creating portals on Qumulo clusters, and establishing relationships between spoke and hub portals, enables Cloud Data Fabric functionality in Qumulo Core.

Tip

Global Namespace is now a core component of Qumulo Cloud Data Fabric.
For a general conceptual introduction, see What is Hub and Spoke Topology?
For specific implementations of the Cloud Data Fabric functionality in Qumulo Core, see Example Cloud Data Fabric Scenarios.

Qumulo clusters can take advantage of the Cloud Data Fabric functionality that lets clusters across disparate geographic or infrastructural locations (on-premises and in the cloud) access the same data while maintaining independent namespace structures on each cluster (for example, by setting only a portion of the cluster’s file system as the portal root directory).

To enable Cloud Data Fabric functionality, you must define a spoke portal on one cluster, a hub portal on another cluster, and then propose a portal relationship between the two.

Important

Before you begin to implement Cloud Data Fabric in your organization, we strongly recommend reviewing this page, especially the Known Limits section.
For any questions, contact the Qumulo Care team.

Key Terms

The following key terms help define the components of Cloud Data Fabric functionality in Qumulo Core.

Clusters and Root Directories

Cluster: Any Qumulo cluster that shares a portion of its file system for a hub portal or a spoke portal. A directory on a cluster defines the root directory for a spoke portal or a hub portal.
Tip

Because a portion of a Qumulo cluster's file system can hold the hub portal root directory or spoke portal root directory, using the correct terminology can help avoid confusion:
- ❌ hub cluster
- ✅ hub portal host cluster
- ❌ spoke cluster
- ✅ spoke portal host cluster

Spoke Portal Root Directory, Hub Portal Root Directory: A directory on a cluster that uses a portion of its file system for the hub portal or spoke portal.

According to the file system permissions that a hub portal might impose, you can access a spoke portal root directory by using NFSv3, SMB, or the Qumulo REST API. Qumulo Core 7.4.3 (and higher) supports NFSv4.1.

Hub Portal Data: Accessible to other Qumulo clusters through a portal relationship or through replication, and to clients that connect to the hub portal host cluster
Spoke Portal Data: Accessible only to clients that connect to the spoke portal host cluster
Cluster-Local Data: Data on a hub portal host cluster or spoke portal host cluster which is located outside of the corresponding portal root directory, accessible to clients that connect to the cluster or to other Qumulo clusters through replication

Note
Qumulo Core allows the S3 protocol to access only hub portal data and cluster-local data.

The following table illustrates the various content types and ways in which this data can be accessed.

Data Type	Data Accessible Through…
Data Type	Other Qumulo clusters through portal relationships	Other Qumulo clusters through replication	Clients that access the spoke portal host cluster	Clients that access the hub portal host cluster
Hub Portal Data	✅	✅	❌	✅
Spoke Portal Data	❌	❌	✅	❌
Cluster-Local Data	❌	✅	✅	✅

Portals

Spoke Portal: An interface point on a Qumulo cluster that accesses a portion of the file system on another cluster (which has a hub portal). A directory on a cluster defines the root directory for spoke portal. The spoke portal initiates the creation of a hub portal. You can configure multiple spoke portals on the same Qumulo cluster, as long as the spoke portal root directories don’t overlap and the host cluster for each portal relationship is unique.
- Read-Write Portal: A spoke portal that can access, modify, and create any files or directories within the hub portal root directory according to the file system permissions.
- Read-Only Portal: A spoke portal that can access any files or directories within the hub portal root directory according to the file system permissions, but can’t modify or create any files or directories regardless of file system permissions.
Hub Portal: An interface point on a Qumulo cluster that shares a portion of its file system with another cluster (which has a spoke portal). A directory on a cluster defines the root directory for hub portal. The spoke portal initiates the creation of a hub portal. You can configure multiple portal relationships, with the same hub portal root directory, with nested directories, or with independent ones.
Note
- It isn't possible to create hub portal without a spoke portal. For example, a spoke portal on Cluster A can propose a portal relationship to Cluster B. This action initiates the creation of a hub portal in a Pending state on Cluster B.
- You must authorize the portal relationship before you can use it.
- While a spoke portal can be either read-only or read-write, a hub portal is always read-write.
Portal Relationship: A proposal that a spoke portal on one Qumulo cluster issues to another Qumulo cluster (with a hub portal), which the Qumulo cluster with the hub portal authorizes.

Portal States

A portal state indicates the stages of the spoke portal creation process, and the proposal or deletion of a portal relationship.

State	Description
`Unlinked`	Qumulo Core created the spoke portal, but couldn't establish a relationship for it or clean up the spoke portal automatically. Before trying to re-establish the portal relationship, use the `qq portal_delete_spoke` command to clean up the spoke portal manually.
`Pending`	Qumulo Core established a relationship between the spoke portal and a hub portal, but the hub portal has not yet given its authorization. Use the `qq portal_authorize_hub` command to give the authorization.
`Authorized`	The portal relationship is approved by both clusters and the spoke portal root directory is accessible, if full connectivity is established.
`Deleting`	Qumulo Core is in process of synchronizing any outstanding changes from the spoke portal to the hub portal. When synchronization is complete, Qumulo Core removes the portal relationship from each cluster.

Portal Statuses

A portal status indicates the accessibility of a spoke portal or hub portal.

Status Description

Status	Description
`Inactive`	The portal relationship is in process of being configured. Full connectivity isn't required at this time. The portal is inaccessible.
`Active`	All required connections between the spoke portal and hub portal are established. The portal requires full connectivity. The portal is fully accessible.
`Degraded`	Some or all required connections between the spoke portal and hub portal are missing. Qumulo Core is attempting to restore connectivity. The portal might be inaccessible.

Inactive

The portal relationship is in process of being configured.

Full connectivity isn't required at this time.
The portal is inaccessible.

Active

All required connections between the spoke portal and hub portal are established.

The portal requires full connectivity.
The portal is fully accessible.

Degraded

Some or all required connections between the spoke portal and hub portal are missing.

Qumulo Core is attempting to restore connectivity.
The portal might be inaccessible.

How Cloud Data Fabric Functionality Works

This section explains the creation of portal relationships, data caching and synchronization, permissions in portal root directories, and the deletion of portal relationships.

Portal Relationship Creation

When the hub portal authorizes the portal relationship, the contents of the hub portal root directory become available to the spoke portal immediately.

Data Synchronization

Caution
The cache of a spoke portal is inherently ephemeral. You must not use it in place of data replication or backup.

For read-write portals, data synchronization is bidirectional, asynchronous, and strictly consistent upon access. For example, when a client creates or modifies files or directories in the spoke portal root directory, the spoke portal synchronizes these changes to the hub portal in the background. Clients that access the hub portal can see these changes immediately.

To ensure that any changes on one portal become available immediately to any client that reads data from the portal’s peers, Qumulo Core uses a proprietary locking synchronization mechanism.

Data Caching

The first time a client accesses a spoke portal root directory, the spoke portal begins to read and cache data from the hub portal. Subsequent access to the same data accesses the cache of the spoke portal host cluster, with performance characteristics equivalent to access to non-portal data on the spoke portal host cluster.

Caching takes place on demand, when a client with access to the spoke portal retrieves new portions of the namespace that the hub portal provides. For more information, see Configuring Cache Management for Spoke Portals in Qumulo Core.

Portal Root Directory Permissions

Qumulo Core enforces permissions in the same way for files and directories in the spoke portal root directory and the hub portal root directory.

Important

Deleting the portal relationship never affects the data on the hub portal.
For a spoke portal to be accessible, there must be full connectivity between the two clusters in a portal relationship, without which files or directories with outstanding modifications on one portal are inaccessible on other portals. Specifically, every node in the spoke portal host cluster must be able to connect to the configured hub portal host cluster address, and the other way around.

Portal Relationship Deletion

This section explains the sequence of events when you request the removal of the portal relationship from the spoke portal or the hub portal.

When you request the removal of the spoke portal, the relationship becomes read-only and enters the Deleting state and Qumulo Core begins to synchronize any outstanding changes from the spoke portal to the hub portal.
During deletion, the relationship requires connectivity to make progress, indicated by the Active status.
After deletion completes, Qumulo Core:
1. Removes the spoke portal and hub portal configuration entries automatically
2. Deletes the spoke portal root directory and reclaims the capacity previously consumed by cached data.

Note
When you remove a portal relationship, any files or directories on the hub portal that were inaccessible, due to both connectivity loss and outstanding spoke portal modifications, become accessible.

Portal Operation Audit Logging

For clients accessing spoke portal data, audit logging is determined by the configuration on the spoke portal host cluster.
For clients accessing hub portal data, audit logging is determined by the configuration on the hub portal host cluster.

Example Cloud Data Fabric Scenarios

The following are examples of some of the most common scenarios for workloads that use Cloud Data Fabric functionality.

Edge Clusters

In this scenario, you deploy a single, large central cluster at your organization’s data center and multiple, small edge clusters at your organization’s branch offices or in remote locations.

A diagram for an example scenario that uses the Cloud Data Fabric functionality for an edge cluster

The Cloud Data Fabric functionality lets you make the data on the central cluster available to the remote clusters without the need to replicate data to each location. The data remains available to the edge clusters even if their capacity is lower than that of the central cluster. While a read-write portal lets the edge clusters create or modify data on the central cluster, a read-only portal lets only the edge clusters read data from the central cluster.

Active Workload with Archive

In this scenario, several clusters serve active workloads but require access to a large data archive after the initial workflow completes.

The Cloud Data Fabric functionality lets you:

Move your cold (infrequently accessed) data to a central archive cluster and then provide access to this data by using a portal on the original cluster.

The active workload clusters can reclaim most of the data set capacity that was tiered to the data archive cluster. This makes it possible to access all of the data as before, while using only the capacity on the active workload clusters for the data that your system reads through the portal.
Serve specific archive capacity and performance needs by scaling the archive cluster independently of any active workflow clusters.

Known Limits

General

Currently, it is possible to configure and manage Cloud Data Fabric functionality only by using the qq CLI.

File System

While Qumulo Core doesn’t support hard links between the files local to the spoke portal host cluster and files within the spoke portal root directory, it does support hard links entirely outside or inside the spoke portal root directory.

Spoke Portals

It is possible to create up to a maximum of 32 hub portals—or 32 spoke portals (Qumulo Core 7.5.0.3 and higher)—on a single Qumulo cluster.
It isn’t possible to nest spoke portal root directories within other spoke portal root directories.

Data Caching

Although first-time data access to data in a portal root directory is subject to round-trip latency between the spoke portal host cluster and the hub portal host cluster, subsequent access to the data is faster. Making changes to data under a portal root directory is also subject to latency when the system recaches these changes upon access.
The cache of a spoke portal is inherently ephemeral. You must not use it in place of data replication or backup.

Portal Relationships

In Qumulo Core 7.5.2 (and higher), it is possible for a Qumulo cluster to host both up to 32 spoke portals and up to 32 hub portals at the same time.
- Currently, Qumulo Core doesn’t support a single cluster establishing two portal relationships with the same remote cluster.
In Qumulo Core 7.5.0.1 to 7.5.1, it is possible for a Qumulo cluster to host only up to 32 hub portals or up to 32 spoke portals.
Your cluster’s Qumulo Core version determines whether the host cluster for each portal relationship must be unique. For example:
- A spoke portal on Cluster A can propose a relationship to a hub portal on Cluster B.
- Another spoke portal on Cluster A can propose a relationship to a hub portal on Cluster C.
- In Qumulo Core 7.5.2 (and higher), it is possible for a spoke portal on Cluster B to propose a relationship to a hub portal on Cluster A or Cluster C (despite Cluster B already having a hub portal).
- In Qumulo Core versions lower than 7.5.2, another spoke portal on Cluster A can’t propose a relationship to a hub portal on Cluster B, because a relationship of that type between portals on the host clusters already exists.

Portal Connectivity

For a spoke portal to be accessible, there must be full connectivity between the two clusters in a portal relationship, without which files or directories with outstanding modifications on one portal are inaccessible on other portals. Specifically, every node in the spoke portal host cluster must be able to connect to the configured hub portal host cluster address, and the other way around.
A spoke portal is inaccessible if the hub portal host cluster and the spoke portal host cluster run different versions of Qumulo Core.

Protocols

S3

Currently, Qumulo Core allows only partial access to portal data through the S3 protocol, including:
- Full read and write access to cluster-local data and hub portal data
- Read-only access to spoke portal data
  
  Note
  Attempting to modify spoke portal data returns an error.
S3 buckets are always local to the Qumulo cluster on which they are created.
Important
- An S3 bucket created in a portal root directory cannot be viewed or accessed from the cluster with which the current cluster has a portal relationship.
- To access spoke portal data through the S3 protocol, it is necessary to create a new bucket on the spoke portal host cluster, even if the corresponding hub portal data is already present in an S3 bucket on the hub portal host cluster.

NFS

While NFSv3 is a stateless protocol, NFSv4.1 is a stateful protocol which permits open file handles to remain open after a file is unlinked. However, Qumulo Core doesn’t always maintain access to files deleted from a portal in a relationship. For example, if you open a file on the spoke portal host cluster and then delete the same file on the hub portal host cluster, an application that uses the file on the spoke portal host cluster will lose access to the file unexpectedly.
When you authenticate over NFSv4.1 by using Kerberos, you can use Kerberos principals only from the Active Directory domain associated with the Qumulo cluster to which you are connected. It isn’t possible to use principals from a remote Qumulo cluster.”
When you edit ACLs over NFSv4.1 by using editfacl or similar tools, you can use only Kerberos principals from the Active Directory domain associated with the Qumulo cluster to which you are connected. It isn’t possible to use principals from a remote Qumulo cluster.
Protocol locks don't synchronize between the hub portal host cluster and the spoke portal host cluster. Specifically, NFSv3 or NLM byte-range locks, NFSv4.1 locking operations, SMB share-mode locks, SMB byte-range locks, and SMB leases function independently on the two clusters. For example, while two exclusive locks on the same spoke portal host cluster contend with each each other, an exclusive lock on a spoke portal host cluster doesn’t contend with an exclusive lock on the hub portal host cluster.

How Portal Relationships between Qumulo Clusters Enable Cloud Data Fabric Functionality in Qumulo Core

Key Terms

Clusters and Root Directories

Portals

Portal States

Portal Statuses

How Cloud Data Fabric Functionality Works

Portal Relationship Creation

Data Synchronization

Data Caching

Portal Root Directory Permissions

Portal Relationship Deletion

Portal Operation Audit Logging

Example Cloud Data Fabric Scenarios

Edge Clusters

Active Workload with Archive

Known Limits

General

File System

Spoke Portals

Data Caching

Portal Relationships

Portal Connectivity

Protocols

S3

NFS

Related Topics