Abstract

Web usage mining exploits data mining techniques to discover valuable information from behavior of World Wide Web (WWW) users. The required information is captured by web servers and stored in web usage data logs. The first phase of web usage mining is the data processing phase. In the data processing phase, first, relevant information is filtered from the logs. After that, sessions are reconstructed by using heuristics that select and group requests belonging to the same user session. If we are processing requests after they are handled by the web server, this technique is called reactive while in proactive techniques the same (pre)processing occurs during the interactive browsing of the web by the user. Reactive session reconstruction uses time and navigation oriented heuristics. We propose to combine these heuristics with site topology information in order to increase the accuracy of the reconstructed sessions. In this work, we have implemented an agent simulator, which models behavior of web users and generates web user as well as the log data kept by the web server. By this way we know the actual user sessions and we can accurately evaluate and compare the performances of alternative session reconstruction heuristics (which will use only the web server log data). To the best of our knowledge, this paper is the first work that uses such an agent simulator, and therefore, is able to accurately evaluate different session reconstruction heuristics. By using the agent simulator, we attempt to show that our new heuristic discovers more accurate sessions than previous heuristics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.