Abstract

Data flows from various sources in structured, semistructured or unstructured form and this type of data flow is referred as big data. Due to their large scale, rapid growth and diverse formats, these datasets are difficult to manage using conventional tools and techniques. Big Data analysis is a daunting activity as it requires large decentralized file systems that should be adaptive, resilient and responsive to fault. For the effective analysis of big data, Map Reduce is commonly used. Big data analysis helps researchers, scholars, and business users to extract the value and knowledge. Huge amounts of data have become accessible to decision makers in the information age. Due to the rapid increase of such data, strategies to manage and obtain value and knowledge from these datasets must be studied and delivered. Moreover, decision-makers must be able to extract useful information from such a dynamic and rapidly changing set of data, which includes everything from daily transactions to customer contact and social media data. In this paper, we explore Hadoop's parallel processing power in two application areas. The first scenario is calculation of minimum and maximum temperature with huge amount of weather data, which has been collected from an open source. The application analyses the entire weather station data set and the minimum and maximum temperatures (in Fahrenheit) of the respective weather stations will be displayed. The second scenario is to find the word count from huge datasets and checks the frequency of each word in a given data set irrespective of the data volume.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call