Abstract

Clustering is one of the fundamental techniques to organise similar objects into proper groups based on features in the domain of data mining, machine learning and pattern recognition. In each cluster, objects are more similar to each other on the basis of particular features. Clustering has numerous applications in multiple domains such as information retrieval, data mining, machine learning, pattern recognition, mathematics, medical and bioinformatics. Web centric applications are expanding day by day and the web has become one of the largest data repositories. During the last decade, information and knowledge retrieval from the web has become a challenging research area. Similarity computation among the data objects (web sessions) is complex, however is a significant problem in unsupervised learning. This research is an attempt to overcome these challenges and problems. The objective of this research paper is to introduce a chi-square based similarity measure to compute the similarity among the sessions. A chi-square based approach is being applied to compute the statistically significant relationship between observed and expected frequencies of the number of pages visited and the time consumed by a user during a session. Moreover, a chi-square based hierarchical agglomerative clustering (Chi-HAC) technique is proposed to extract useful knowledge from web log. The Chi-HAC helps to improve the visualisation of web logs and is equally important for website designers, developers and owners for the improvements of websites at each level. Experimental results with two different log files reveal that the proposed similarity measure with Chi-HAC algorithm has significantly improved the computation among data objects in web sessions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.