Statistical Analysis of Web Server Logs UsingApache Hive in Hadoop Framework

S Harish ,G Kavitha

doi:10.15680/ijircce.2015.0305074

Abstract

Web log file is log file automatically created and maintained by a web server.Analyzing web server access logs files will offer valuable insight into website usage. Because of the tremendous usage of web, the web log files are growing at faster rate and the size is becoming huge. Processing this explosive growth of log files using relational database technology has been facing a bottle neck. To analyze such large datasets we need parallel processing system and reliable data storage mechanism. Hadoop rides the big data where massive quantity of information is processed using cluster of commodity hardware. In this paper based on the architecture of Hadoop Distributed File System and HadoopMapReduce framework and HiveQL query language, we present the methodology used in preprocessing of huge volume of web log files and finding the statics of website and learning the user behavior.

Full Text