A Review Paper on Big Data and Hadoop is solution for Big Data

Apurva A Bhuyar

doi:10.55041/ijsrem11911

Abstract

The amount of data produced and communicated over Internet is significantly increased. The formation of this massive amount of data through divergent source is massive. Now days a Big Data application where data collection has grown very fast and traditional software tools are unable to capture, manage, and process it. In this paper we are highlighting Big Data its sources and types such as structured, unstructured and semi structured. Data is generated from various different sources and can arrive in the system at various rates. In order to process these large amounts of data in an inexpensive and efficient way, parallelism is used. Big Data is a data whose scale, diversity, and complexity require new architecture to manage it and extract value and hidden knowledge from it. Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.

Full Text