A Primary Shift Protocol for Improving Availability in Replication Systems

Almetwally M Mostafa ,Ahmed E Youssef

doi:10.5120/12485-8905

Abstract

Backup Replication (PBR) is the most common technique to achieve availability in distributed systems. However, primary failure remains a crucial problem that threatens availability. When the primary fails, backup nodes in the system have to elect a new primary node in order to maintain adequate system's operation. During election, the system suffers from transaction loss, communication overhead due to messages exchange necessary to preserve data consistency, and a notable delay caused by the execution of Leader Election Algorithms (LEA). Primary failures can be unpredictable (i.e., unplanned), such as primary node crashes and network outages, or predictable (i.e., planned), such as primary's scheduled shutdown to perform routine maintenance or software upgrade. Traditionally, PBR employ LEA to recover from both unplanned and planned outages. In this paper, we propose a novel protocol, called Primary Shift Replication (PSR), to avoid election during planned outages. PSR shifts the primary role from the current primary to another scheduled node (without election) when a planned outage is about to occur. Number of messages and communication time required to shift the primary node to another node is much less than number of messages and time required to perform leader election; therefore, PSR improves system's availability. Moreover, PSR guarantees no transactions loss during the shift mode, hence, it preserves data consistency.

Full Text