Abstract

Big data system requires multiple types of data organizations to efficiently support various operations. It is well known that in-place update index, unordered log structured index and ordered log structured index are three typical data organizations which are designed to meet different workload requirements respectively. Differentiated workload requirements in different phase of the data life-cycle lead to data organization transformation. However, typical sequential data organization transformation not only incurs extremely long time, but also significant energy consumption. In this paper, we propose Chameleon, a novel data organization transformation scheme for replication based big data system. The goal of Chameleon is to significantly shorten the data organization transformation process and improve the write performance and the subsequent read performance through data organization transformation, meanwhile eliminate the additional hardware and energy costs by reusing the mirrored disks. For each put request, Chameleon keeps two copies of the key-value pair. One in its normal place and organized in ordered log structured index, and the other in relatively high performance log disk and organized in unordered log structured index. By spreading destaging I/O activities among short idle time slots, key-value pairs are transformed from write-optimized index to read-optimized index. Extensive experimental evaluation based on our prototype shows that Chameleon can shorten the time of data organization transformation and enhance energy efficiency and performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call