Big Data Management Using Hadoop

Majida Yaseen Khalil,Murtadha M Hamad

doi:10.1088/1742-6596/1804/1/012109

Majida Yaseen Khalil, Murtadha M Hamad

Open Access

https://doi.org/10.1088/1742-6596/1804/1/012109

Copy DOI

Journal: Journal of Physics: Conference Series	Publication Date: Feb 1, 2021
Citations: 4	License type: cc-by

Affiliation: University of Anbar

Abstract

Today, one of the key issues is the design of systems and software to deal with the storage, management and processing of large amounts of data as a result of the exponential rise in data. In unstructured forms, these data are found. Due to the large and complex data sizes, data management with traditional approaches is unacceptable. Hadoop is an appropriate solution for the continuous growth of data sizes. We have suggested in this paper techniques and algorithms dealing with big data including data collection, preprocessing of data. The Fragmentation algorithm will take the function of a distributed implementation of the traditional file system time-sharing model, where various users share files and storage resources. Also, in this research we used a framework to improve the performance of a query and reduce the response time called the HADOOP. The Apache Hadoop project for safe, scalable and distributed computing. The results showed that Hadoop is the best way to deal with big data during calculating the rate of response time of a complex query for example at (00:00:01) per second and comparing it with the response time of the same queries on the fragmentation algorithm at (00: 01:11) per second and the standard database at (00:05:13) per second. We concluded that Total time Access for complex queries in distributed processing is faster than in non-distributed processing.

Full Text