This section provides an overview of how Qumulo clusters ensure continued operation in the event of a drive or node failure.

How Qumulo Core Ensures Fault Tolerance

Qumulo Core protects your cluster with a 6,4 erasure coding (2 concurrent drive failures or 1 node failure), at minimum. When a drive fails, Qumulo Core begins to rebuild the data that was previously stored on the failed drive.

As a cluster increases in size, Qumulo Core makes additional fault tolerance options available during the cluster creation process. After creating a cluster, you can use Adaptive Data Protection to include the cluster’s fault tolerance during node-add procedures.

To view the fault tolerance of your Qumulo cluster:

  • Web UI: Navigate to the Cluster Overview page

  • qq CLI: Run the qq protection_status_get command

  • REST API: Call the /v1/cluster/protection/status endpoint

Read-Only Mode Scenario for Hybrid Nodes

When a hybrid node goes offline for a substantial period of time, there is a risk of the node entering read-only mode because Qumulo Core writes all inbound operations only to the node’s SSDs.

The length of time before this scenario takes place depends on the number of drives in a node and the rate of incoming writes, deletes, and changes. For more information, see Understanding Offline Nodes and Checking for Free Space. If you encounter this scenario, contact the Qumulo Care team.

The following sections describe various drive and node failure protection configurations and how they correspond to failure scenarios and data protection states.

2-Drive, 1-Node Protection (2,1)

This is the default system configuration. This configuration requires a minimum of 4 nodes.

Failure Scenario Severity Data Protection State
1 drive failure ⬇️ Low The data is protected. You can replace a failed drive at any time.
2 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
1 node failure 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
3 (or more) drive failures or multiple node failures 🚩 High The data is unavailable but intact.

3-Drive, 1-Node Protection (3,1)

This configuration requires a minimum of 5 nodes.

Failure Scenario Severity Data Protection State
1 drive failure ⬇️ Low The data is protected. You can replace a failed drive at any time.
2 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
3 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
1 node failure 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
4 (or more) drive failures
or multiple node failures
🚩 High The data is unavailable but intact.

3-Drive, 2-Node Protection (3,2)

This configuration requires a minimum of 11 nodes.

Failure Scenario Severity Data Protection State
1 drive failure ⬇️ Low The data is protected. You can replace a failed drive at any time.
2 drive failure ⬇️ Low The data is protected. You can replace a failed drive at any time.
3 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
1 node failure 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
2 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
4 (or more) drive failures
or more than 2 node failures
🚩 High The data is unavailable but intact.

3-Drive, 3-Node Protection (3,3)

This configuration requires a minimum of 11 nodes.

Failure Scenario Severity Data Protection State
1 drive failure ⬇️ Low The data is protected. You can replace a failed drive at any time.
2 drive failures ⬇️ Low The data is protected. You can replace a failed drive at any time.
3 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
1 node failure 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
2 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
3 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
4 (or more) drive failure
or more than 3 node failures
🚩 High The data is unavailable but intact.

4-Drive, 2-Node Protection (4,2)

This configuration requires a minimum of 12 nodes.

Failure Scenario Severity Data Protection State
1 drive failure ⬇️ Low The data is protected. You can replace a failed drive at any time.
2 drive failures ⬇️ Low Data is protected. You may replace a failed drive at any time.
3 drive failures ⚠️ Medium Data is protected. You may replace a failed drive at any time.
4 drive failures ⚠️ Medium Data is protected. You may replace a failed drive at any time.
1 node failure 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
2 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
5 (or more) drive failures
or more than 2 node failures
🚩 High The data is unavailable but intact.

4-Drive, 3-Node Protection (4,3)

This configuration requires a minimum of 24 nodes.

Failure Scenario Severity Data Protection State
1 drive failure ⬇️ Low The data is protected. You can replace a failed drive at any time.
2 drive failures ⬇️ Low The data is protected. You can replace a failed drive at any time.
3 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
4 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
1 node failure 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
2 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
3 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
5 (or more) drive failure
or more than 3 node failure
🚩 High The data is unavailable but intact.

4-Drive, 4-Node Protection (4,4)

This configuration requires a minimum of 24 nodes.

Failure Scenario Severity Data Protection State
1 drive failure ⬇️ Low The data is protected. You can replace a failed drive at any time.
2 drive failures ⬇️ Low The data is protected. You can replace a failed drive at any time.
3 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
4 drive failures ⚠️ Medium The data is protected. You can replace a failed drive at any time.
1 node failure 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
2 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
3 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
4 node failures 🚩 High The data is protected. The cluster is at risk of going into read-only mode.
5 (or more) drive failures
or more than 4 node failures
🚩 High The data is unavailable but intact.