Abstract
High performance data analytics is a computing paradigm involving optimal placement of data, analytics and other computational resources such that superior performance is achieved with lesser resource consumption. Resource allocation and scheduling are the two major functionalities to be addressed in Hadoop clusters to satisfy the service level agreements of users for High performance data analytics applications. Though many solutions have been proposed for optimal resource allocation and scheduling, those schemes are designed for large Hadoop files. Recently with Internet of Things (IoT) convergence with big data, there is need to process large volumes of small files whose size is lower than block size of Hadoop. This creates huge storage overhead and exhausts Hadoop clusters computational resources. This survey analyzes the existing works on resource allocation and scheduling in Hadoop clusters and their suitability for small files. The aim is to identify the problems in existing resource allocation and scheduling approaches while handling small files. Based on the problems identified, prospective solution architecture is proposed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Scientific Research in Science, Engineering and Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.