Abstract
The usage of unstructured data is becoming obvious by companies and social media is raised heavily from past decade. The sharing of images, audio, and video content by the individual user and corporate can be observed everywhere. The current work focused on the Hadoop framework revision contributions so as to improve the performance of the eco system in the context of space and time parameters. The architecture basically provides the usage of Hadoop Distributed File System (HDFS) and MapReduce (MR). We are proposing certain revision contributions so that the process of importing and processing of the tasks can get the benefit of time and space usage in the effective and efficient manner. The work provides the service running in two different ways which reduces the time requirements of the cluster management. In the distributed environment, this revision helps in the reduction of waiting time for the start of the service. The other context we have focused on, the local file system handler in the data storage and processing of the data, the provision of using the file system according to the proposed architecture, will handle the CPU context switch while performing the import and export process in the running of the jobs. The outcome of the work is revision architecture to reflect the service initiation by all the machines in the cluster and file system revision approach to minimize the CPU context switch, while performing the storage and processing relevant aspects of the Hadoop cluster.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have