Abstract:In a distributed storage system based on a three-replica strategy, when a hard disk on the storage node fails, a common processing method is to wait for the system’s preset time. If the faulty disk doesn’t recover within the specified timeout, the recovery of the replicas on the faulty hard disk will begin. The issue with this handling approach is that when there is a faulty replica within the three-replica group, another disk failure in the same group will result in the system being unable to continue providing services and recover automatically. This study introduces an improved Raft consensus algorithm based on log replicas, namely log replica based Raft (LR-Raft). Log replicas do not have a complete state machine, allowing them to quickly join the cluster and participate in voting and consensus, thereby enhancing system availability in the presence of a faulty disk. It can address the problem of unavailability and data loss in the cluster caused by the failure of two replicas in a three-replica setup in a short period. The experimental results indicate that with the introduction of log replicas into the replica group, LR-Raft significantly reduces read and write latency and substantially improves throughput compared to the original Raft across various workloads.