Abstract

As the cost-per-byte of SSDs dramatically decreases, the introduction of SSDs to Hadoop becomes an attractive choice for high performance data processing. In this paper the cost-perperformance of SSD-based Hadoop cluster (SSDHadoop) and HDD-based Hadoop cluster (HDDHadoop) are evaluated. For this, we propose a MapReduce performance model using queuing network to simulate the execution time of MapReduce job with varying cluster size. To achieve an accurate model, the execution time distribution of MapReduce job is carefully profiled. The developed model can precisely predict the execution time of MapReduce jobs with less than 7% difference for most cases. It is also found that SSD-Hadoop is 20% more cost efficient than HDD-Hadoop because SSD-Hadoop needs a smaller number of nodes than HDD-Hadoop to achieve a comparable performance, according to the results of simulation with varying the number of cluster nodes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call