Abstract

Web based applications are increasing at an enormous speed and consequently its users are also increasing at an exponential speed. The evolutionary changes in technology have made it possible to capture the users' essence and interactions with web applications through web server log file as web usage. The web usage Mining (WUM) is the process of discovering hidden patterns from the web usage. Due to large amount of “irrelevant information” in the web log, the original log file cannot be directly used in the WUM process. Therefore, the preprocessing of web log file becomes imperative. The proper analysis of web log file is beneficial to manage the websites effectively for administrative and users' prospective. Web log preprocessing is an initial necessary step to improve the quality and efficiency of the later steps of WUM. There are number of techniques available at preprocessing level of WUM such as data cleaning; data filtering; user identification; session identification and session clustering. In this research paper, a complete preprocessing technique is being proposed to preprocess the web log for extraction of user patterns. Data cleaning algorithm removes the irrelevant entries from web log and filtering algorithm discards the uninterested attributes from log file. User and sessions are identified. Proposed hierarchical sessionization algorithm generates the hierarchy of sessions. We obtain unbiased hierarchical clusters from the web log file.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call