Abstract

Web logs can provide a wealth of information on user access patterns of a corresponding website, when they are properly analyzed. However, finding interesting patterns hidden in the low-level log data is non-trivial due to large log volumes, and the distribution of the log files in cluster environments. This paper presents a novel technique, the application of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Expectation Maximization (EM) algorithms in an iterative manner for clustering web user sessions. Each cluster corresponds to one or more web user activities. The unique user access pattern of each cluster is identified by frequent pattern mining and sequential pattern mining techniques. When compared with the clustering output of EM, DBSCAN, and k-means algorithms, this technique shows better accuracy in web session mining, and it is more effective in identifying cluster changes with time. We demonstrate that the implemented system is capable of not only identifying common user behaviors, but also of identifying cyber-attacks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call