As the volume and complexity of big data continue to escalate, optimizing the performance, scalability, and energy efficiency of big data applications within cloud data centers has become increasingly crucial. This journal presents a comprehensive survey of current optimization techniques, focusing on data placement, job scheduling, and network configurations tailored for cloud environments. We explore the impact of various data center topologies on the performance of big data frameworks like Hadoop, emphasizing the trade-offs between performance and energy efficiency. Advanced methodologies, including dynamic data placement strategies, locality-aware scheduling, and innovative reduce task placement techniques, are reviewed in depth. Additionally, we highlight the importance of network power effectiveness (NPE) and examine the role of optical and electronic switching technologies in enhancing data center efficiency. By synthesizing findings from recent studies, this paper provides valuable insights into the optimization of cloud data centers, offering recommendations for improving resource utilization and reducing job completion times while maintaining energy efficiency. The findings contribute to the ongoing efforts to scale and adapt cloud data infrastructures for the rapidly growing demands of big data applications.
Read full abstract