Abstract

Web log file is log file automatically created and maintained by a web server.Analyzing web server access logs files will offer valuable insight into website usage. Because of the tremendous usage of web, the web log files are growing at faster rate and the size is becoming huge. Processing this explosive growth of log files using relational database technology has been facing a bottle neck. To analyze such large datasets we need parallel processing system and reliable data storage mechanism. Hadoop rides the big data where massive quantity of information is processed using cluster of commodity hardware. In this paper based on the architecture of Hadoop Distributed File System and HadoopMapReduce framework and HiveQL query language, we present the methodology used in preprocessing of huge volume of web log files and finding the statics of website and learning the user behavior.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.