Abstract

Utilizing new type devices, such as SSD, to improve I/O performance of hybrid storage has become a tendency recently. Many efforts are made to apply the new type devices to hybrid storage in distributed environment, but most of them are confined to the specific file systems, such as HDFS. Besides, the low performance of HDFS descends the performance of hybrid storage. In this paper, we improve the performance of tiered storage system (one kind of hybrid storage system) in distributed environment with a plughable eviction framework considering that the data on each node is regularly accessed. On top of the eviction framework, we provide a couple of eviction policies, including LRU, LRFU, LIRS and ARC, covering different access patterns to accelerate the upper big data applications. Moreover, our design is general for all tiered storage systems. Then we evaluate the performance of our eviction framework through three widely-used big data applications and discover that LIRS can improve 30% hit ratio than most of other policies when running KMeans and PageRank, ARC can improve maximum 30% hit ratio than other policies when running complicated SQL applications, LRFU can always achieve relatively good performance when the configuration properties are set in reasonable range. We have implemented our prototype on Alluxio, which is a widely-used memory-centric distributed storage system. In addition, these eviction policies contributed by us have been merged into Alluxio and are already being in use.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call