Understanding and improvement of the selection of replica servers in key–value stores

Wanchun Jiang,Haiming Xie,Xiangqian Zhou,Liyuan Fang,Jianxin Wang

doi:10.1016/j.is.2019.04.007

Abstract

In current large-scale distributed key–value stores, the tail latency of the hundreds of key–value access operations generated by an end-user request determines the response time of this request. Therefore, this tail latency has great impacts on the user experience and revenue. Replica selection algorithms, which select the best replica server for the service of each key–value access operation as much as possible, is the key to cut the tail latency of these key–value access operations. This paper summarizes current replica selection algorithms, including both the algorithms employed by current key–value stores and the classic algorithms of other similar systems. These algorithms are classified into three categories: information-agnostic, client-independence and feedback, according to their demanded information. As a step further, simulation-based performance analysis of these algorithms is conducted. The result brings us the insights that the response time (RPT) is useful to measure the service rate, but will lead to the herd behaviors. Moreover, the number of outstanding key–value access operations (OSKs) is helpful to both the selection of the fastest replica server and the avoidance of herd behaviors. Based on these insights, we design the L2 algorithm by assembling the basic ideas of the Least OSK algorithm and the Least RPT algorithm. The L2 algorithm is much simpler than the recently proposed C3 algorithm and has a similar best performance with C3 as confirmed by the simulation and experimental results.

Full Text