Abstract

The amount of data produced and communicated over Internet is significantly increased. The formation of this massive amount of data through divergent source is massive. Now days a Big Data application where data collection has grown very fast and traditional software tools are unable to capture, manage, and process it. In this paper we are highlighting Big Data its sources and types such as structured, unstructured and semi structured. Data is generated from various different sources and can arrive in the system at various rates. In order to process these large amounts of data in an inexpensive and efficient way, parallelism is used. Big Data is a data whose scale, diversity, and complexity require new architecture to manage it and extract value and hidden knowledge from it. Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call