This section provides an overview of how Qumulo clusters ensure continued operation in the event of a drive or node failure.
How Qumulo Core Ensures Fault Tolerance
Qumulo Core protects your cluster with a 6,4
erasure coding (2 concurrent drive failures or 1 node failure), at minimum. When a drive fails, Qumulo Core begins to rebuild the data that was previously stored on the failed drive.
When Qumulo Core finishes reprotecting the drive, it resets the fault tolerance for the cluster, regardless of whether you have replaced the failed drive.
As a cluster increases in size, Qumulo Core makes additional fault tolerance options available during the cluster creation process. After creating a cluster, you can use Adaptive Data Protection to include the cluster’s fault tolerance during node-add procedures.
Depending on a cluster’s size constraints, certain configurations (such as 1 concurrent drive failure or 4 node failures) might not be possible.
To view the fault tolerance of your Qumulo cluster:
-
Web UI: Navigate to the Cluster Overview page
-
qq
CLI: Run theqq protection_status_get
command -
REST API: Call the
/v1/cluster/protection/status
endpoint
Read-Only Mode Scenario for Hybrid Nodes
When a hybrid node goes offline for a substantial period of time, there is a risk of the cluster entering read-only mode because Qumulo Core writes all inbound operations only to the node’s SSDs.
The length of time before this scenario takes place depends on the number of drives in a node and the rate of incoming writes, deletes, and changes. For more information, see Understanding Offline Nodes and Checking for Free Space. If you encounter this scenario, contact the Qumulo Care team.
The following sections describe various drive and node failure protection configurations and how they correspond to failure scenarios and data protection states.
2-Drive, 1-Node Protection (2,1)
This is the default system configuration. This configuration requires a minimum of 4 nodes.
Failure Scenario | Severity | Data Protection State |
---|---|---|
1 drive failure | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
2 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
1 node failure | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
3 (or more) drive failures or multiple node failures | 🚩 High | The data is unavailable but intact. |
3-Drive, 1-Node Protection (3,1)
This configuration requires a minimum of 5 nodes.
Failure Scenario | Severity | Data Protection State |
---|---|---|
1 drive failure | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
2 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
3 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
1 node failure | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
4 (or more) drive failures or multiple node failures |
🚩 High | The data is unavailable but intact. |
3-Drive, 2-Node Protection (3,2)
This configuration requires a minimum of 11 nodes.
Failure Scenario | Severity | Data Protection State |
---|---|---|
1 drive failure | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
2 drive failure | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
3 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
1 node failure | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
2 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
4 (or more) drive failures or more than 2 node failures |
🚩 High | The data is unavailable but intact. |
3-Drive, 3-Node Protection (3,3)
This configuration requires a minimum of 11 nodes.
Failure Scenario | Severity | Data Protection State |
---|---|---|
1 drive failure | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
2 drive failures | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
3 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
1 node failure | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
2 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
3 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
4 (or more) drive failure or more than 3 node failures |
🚩 High | The data is unavailable but intact. |
4-Drive, 2-Node Protection (4,2)
This configuration requires a minimum of 12 nodes.
Failure Scenario | Severity | Data Protection State |
---|---|---|
1 drive failure | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
2 drive failures | ⬇️ Low | Data is protected. You may replace a failed drive at any time. |
3 drive failures | ⚠️ Medium | Data is protected. You may replace a failed drive at any time. |
4 drive failures | ⚠️ Medium | Data is protected. You may replace a failed drive at any time. |
1 node failure | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
2 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
5 (or more) drive failures or more than 2 node failures |
🚩 High | The data is unavailable but intact. |
4-Drive, 3-Node Protection (4,3)
This configuration requires a minimum of 24 nodes.
Failure Scenario | Severity | Data Protection State |
---|---|---|
1 drive failure | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
2 drive failures | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
3 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
4 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
1 node failure | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
2 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
3 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
5 (or more) drive failure or more than 3 node failure |
🚩 High | The data is unavailable but intact. |
4-Drive, 4-Node Protection (4,4)
This configuration requires a minimum of 24 nodes.
Failure Scenario | Severity | Data Protection State |
---|---|---|
1 drive failure | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
2 drive failures | ⬇️ Low | The data is protected. You can replace a failed drive at any time. |
3 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
4 drive failures | ⚠️ Medium | The data is protected. You can replace a failed drive at any time. |
1 node failure | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
2 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
3 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
4 node failures | 🚩 High | The data is protected. The cluster is at risk of going into read-only mode. |
5 (or more) drive failures or more than 4 node failures |
🚩 High | The data is unavailable but intact. |