A Novel Processing of Scalable Web Log Data Using Map Reduce Framework

Yeturu Jahnavi,Shaswat Srivastava,V S K Sindhura,Y Pavan Kumar Reddy,Vidisha Tiwari

doi:10.1007/978-981-19-7892-0_2

Abstract

The traditional RDBMS is not sufficient to manage highly scalable web log data. Hadoop framework can overcome the problems raised in traditional systems. Hadoop is a sophisticated framework to manage gigantic amounts of scalable data. It contains a Map Reduce framework which helps in writing applications to process enormous volumes of data in parallel, on voluminous clusters of vendible hardware in a reliable manner. Analysis of Weblog data needs to be processed in a distributed environment due to its nature of huge volume and also the generation of online streaming data. Hadoop framework with pig scripting is used for extracting the dynamic patterns from weblog data in a distributed environment. Diverse situations in the data set are rendered by using status codes. Frequency of status codes is also analyzed. Managing a huge volume of data using distributed processing considerably reduces the execution time.

Full Text