Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies

Savitha K,Vijaya Ms

doi:10.14569/ijacsa.2014.050119

Abstract

Big Data is an emerging growing dataset beyond the ability of a traditional database tool. Hadoop rides the big data where the massive quantity of information is processed using cluster of commodity hardware. Web server logs are semi-structured files generated by the computer in large volume usually of flat text files. It is utilized efficiently by Mapreduce as it process one line at a time. This paper performs the session identification in log files using Hadoop in a distributed cluster. Apache Hadoop Mapreduce a data processing platform is used in pseudo distributed mode and in fully distributed mode. The framework effectively identifies the session utilized by the web surfer to recognize the unique users and pages accessed by the users. The identified session is analyzed in R to produce a statistical report based on total count of visit per day. The results are compared with non-hadoop approach a java environment, and it results in a better time efficiency, storage and processing speed of the proposed work.

Highlights

A data is a collection of facts from the grids of web servers usually of unorganized form in the digital universe
The web server logs are mined for efficient session identification using Hadoop Mapreduce
The NASA web server logs gathered in four different files are used for processing in hadoop environment

Summary

INTRODUCTION

A data is a collection of facts from the grids of web servers usually of unorganized form in the digital universe. The volume of data becomes larger day by day as the usage of World Wide Web makes an interdisciplinary part of human activities Rise of these data leads to a new technology such as big data that acts as a tool to process, manipulate and manage very large dataset along with the storage required. Big data is distinct from large existing database which uses Hadoop framework for data intensive distributed applications. Sayalee Narkhede et al, [5] introduced the Hadoop-MR log file analysis tool that provides a statistical report on total hits of a web page, user activity, traffic sources. The tweets are stored in Hbase using Hadoop cluster through Rest Calls and text mining algorithms are processed for data analysis. The identified session is analyzed based on date and number of times visited using R tool

HADOOP MAPREDUCE

LOG MINING USING HADOOP APPROACH

RESULTS AND INTERPRETATIONS

Pseudo Distributed Mode Hadoop framework consist of five daemons namely

Fully distributed mode

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2014
Citations: 17	License type: cc-by

R Discovery Prime

R Discovery Prime

Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

File Inclusion Vulnerability Analysis usingHadoop and Navie Bayes Classifier
...
International Journal of Innovative Research in Computer and Communication Engineering | VOL. 3
, et. al. ...
21 Jul 2015
International Journal of Innovative Research in Computer and Communication Engineering | VOL. 3

The relationship between intellectual capital and big data: a review
Federica De Santis ... Claudia Presti
Meditari Accountancy Research | VOL. 26
Federica De Santis, et. al.Federica De Santis ... Claudia Presti
14 Aug 2018
Meditari Accountancy Research | VOL. 26

A Maturity Analysis of Big Data Technologies
Radu Boncea ... Alin Zamfiroiu
Informatica Economica | VOL. 21
Radu Boncea, et. al.Radu Boncea ... Alin Zamfiroiu
30 Mar 2017
Informatica Economica | VOL. 21

BIG DATA TECHNOLOGIES AND INFORMATION SECURITY OF UKRAINE
V Pryimak ... J Ledzianowski
Visnyk of the Lviv University. Series Economics | VOL. -
V Pryimak, et. al.V Pryimak ... J Ledzianowski
18 Dec 2019
Visnyk of the Lviv University. Series Economics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications