Abstract

Relational fuzzy clustering (RFC) algorithms prove very useful in Web user session clustering because Web user sessions may contain fuzzy, conflicting and imprecise information. Though RFC algorithms are very sensitive to cluster initialization and works only if the numbers of clusters are specified in advance. However, at all times, the prior initialization of a number of clusters is not feasible due to the dynamically evolving nature of user sessions. Therefore, estimating the number of clusters and initializing suitable cluster prototype are a significant performance bottleneck in this method. In this paper, the discounted fuzzy relational clustering (DFRC) algorithm is proposed to address the major constraint of RFC. The DFRC algorithm identifies Web user session clusters from Web server access logs, without initializing the number of clusters and prototypes of initial clusters. The DFRC algorithm works in two stages. In the first stage, DFRC automatically identifies the number of potential clusters based on the successively discounted potential density function value of each relational data and their respective centres. In the second stage, DFRC assigns fuzzy membership values to each data point and forms fuzzy clusters from the relational matrix. The DFRC algorithm is applied on an augmented session dissimilarity matrix obtained from a publicly accessed NASA Web server log data. The experimental results are evaluated using different fuzzy validity measures. The extensive experiments are performed to test the effect of various parameters, including accept/reject ratio and neighbourhood radius on the performance of DFRC algorithm. The results were also compared with fuzzy relational clustering algorithm using cluster quality measures. It is observed that the quality of generated clusters using DFRC is superior as compared with that of RFC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call