Sentence-Similarity Based DocumentClustering Using Birch Algorithm

Dr.a.vijaya Kathiravan ,P.kalaiyarasi

doi:10.15680/ijircce.2015.0305055

Abstract

Document clustering is the process of automatically grouping the related documents into clusters. Instead of searching entire documents for relevant information, these clusters will improve the efficiency and avoid overlapping of contents. Relevant document can be efficiently retrieved and accessed by means of document clustering. When compared with hard clustering, birch algorithms allow patterns to belong to all the clusters with differing degrees of membership. Birch algorithm is important in domains such as sentence clustering, since a sentence is related to more than one theme or topic present within a document or set of documents. In our proposed system, birch clustering algorithm operates on cluster Start with initial threshold and inserts points into the tree. Results obtained while applying the algorithm to sentence clustering tasks demonstrate that birch algorithm is capable of identifying overlapping clusters of semantically related sentences and its performance improvement can be proved by comparing with k-means. Performance measures document clustering and its application in document summarization

Full Text