Automatic drive tests

StorPool automatically tries to return a drive that previously failed an operation back into the cluster in case it is still available and can recover from the failure.

About

On many occasions a disk write (or read) operation might timeout after the drive’s internal failure handling mechanisms kick in. An example is a bad sector on an HDD drive being replaced, or a controller to which the drive is connected resets. In some of these occasions the operation times out and an I/O error is returned to StorPool, which - before the introduction of the automatic drive tests - triggered an eject for the failed disk drive. While this might be an indication for pending failure, in most cases the disk might continue working without any issues for weeks, sometimes even months before another failure occurs.

With automatic drive tests, such failures are handled by automatically re-testing each such drive if it is still visible to the operating system. If the results from the tests are within the expected thresholds, the disk is returned back into the cluster.

Manual tests

A disk test can be triggered manually as well. The drive will automatically be returned back into the cluster if the test was successful. The last result from the test can be queried through the CLI.

History

This feature is available starting with the 19.1 revision 19.01.1217.1635af7 release.