Abstract

Now a day’s use of internet has been increased tremendously. Every day internet users generate 2.5 quintillion bytes of data from various sources, and thus leads to Big data analytics. Web usage mining is the type of web mining activity that involves discovering user access pattern from web log data. Web usage mining has three phases such as Data Preprocessing, Data Discovery and Data Evaluation. In this paper we have mainly focused on Data preprocessing. Data preprocessing is an important phase of Web usage mining required to unstructured, heterogeneous and unwanted (noisy) nature of log data. In general, two types of logs ie., server-side logs and client side logs are used for web usability analysis. Preprocessing consists of four phases, Data Extraction, Data Cleaning, User identification, Session Identification and Path completion. This paper presents a specific data preprocessing case using hadoop tool for Vizhamurasu News site. In this work, server-side logs are considered to experiment the proposed preprocessing algorithms. The existing preprocessing algorithms are efficient but that are not scalable because when we increasing size of log file and also take much more computation time compared to proposed parallel computing techniques.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.