Abstract

In recent years, big data applications with scheduling algorithms have evolved lot due to the advancement of new technologies and techniques. We are living in digital data world where the data size is in terms of Exabyte or Pico Byte. This large volume of data is referred as big data. In today’s business environment, the performance of applications largely depends on the efficient retrieval of relevant data on time; the data analysis and retrieval of relevant data need to be done at faster rate. The traditional scheduling algorithms will not be efficient to handle such huge volume of data, considering the above facts managing big data applications and scheduling of big data on distributed architecture has become a challenging research area in the last three–four years. To process such huge volume of data, efficient scheduling algorithms need to be adopted to achieve better performance. The existing MapReduce implementation on Hadoop framework on single node cluster limits themselves to implement all the jobs on single node cluster. In this paper, we will discuss different scheduling techniques and their performance effects on a multimode clusters. The parameters considered for performance evaluation are CPU time, physical memory, and virtual memory. The main aim is to provide survey of different scheduling algorithms that can be used across distributed architecture to achieve better performance in analysis of big data considering YouTube dataset. The results interpret that capacity-based scheduling algorithm is more efficient as compared to FIFO and FAIR in terms of CPU cycles, physical and virtual memory utilization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.