Abstract
In complex and data-intensive applications, data scheduling between data centers must occur when multiple datasets stored in distributed data centers are processed by one computation. To store massive datasets effectively and reduce data scheduling between data centers during the execution of computations, a mathematical model of data scheduling between data centers in cloud computing is built and dynamic computation correlation (DCC) between datasets is defined. Then a data placement strategy for big data based on DCC is proposed. Datasets with high DCC are placed into the same data center, and new datasets are dynamically distributed into the most appropriate data center. Comprehensive experiments show that the proposed strategy can effectively reduce the number of data scheduling between data centers and has a considerably low and almost constant computational complexity when the number of data centers increases and the datasets are massive. It can be expected that the proposed strategy will be applicable to the practical large-scale distributed storage systems for big data management.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.