Transitioning cloud-based Hadoop frameworks from IaaS to PaaS, which are commercially conceptualized as pay-as-you-go or pay-per-use, often reduces the associated system costs. However, the managed Hadoop systems obscure the inner performance dynamics of the platform and present a black-box behavior to the end-users. The aim of this study was to investigate the resource utilization of current managed Hadoop platforms. Thus, we explored three prominent Hadoop-on-PaaS proposals as they come out-of-the-box and conducted Hadoop-specific workloads using the HiBench Benchmark Suite. During the benchmark executions, the system resource utilization data from the worker nodes were collected and analyzed. The results indicated that the same property specifications among cloud services neither do guarantee similar performance outputs, nor produce consistent results based on different workloads within themselves. We anticipate that the managed systems’ architectures and pre-configurations play a crucial role in the performance outcomes.
Read full abstract