A study on Clustering Algorithms for XML Data Clustering

S Saranya,B.S.E Zoraida

doi:10.9790/0661-1805018489

Abstract

Nowadays mining meaningful information from large scale web documents is more important to satisfy the user demand. XML and RDF documents are supporting the semantic information retrieval to interpret and extract meaningful information for user query. XML documents have light weight code and logical structure, which facilitate easy exchange of data values and structure information in terms of knowledge. Many mining techniques and algorithms are used to enhance the performance of XML information Retrieval. Classification (Supervised Learning) and Clustering (Unsupervised Learning) are the preprocessing techniques used to grouping up the similar data objects based on similarity criteria. This paper presents the study on three clustering algorithms (k-means, EM, Tree Clustering) and its similarity measures on XML datasets. The three clustering algorithms are compared and tested with the same xml datasets for finding the best one to cluster XML documents.

Full Text