An Enhanced Fuzzy Clustering and Expectation Maximization Framework based Matching Semantically Similar Sentences

M Uma Devi,G Meera Gandhi

doi:10.1016/j.procs.2015.07.406

M Uma Devi, G Meera Gandhi

Open Access

https://doi.org/10.1016/j.procs.2015.07.406

Copy DOI

Abstract

Abstract Statistical measure of finding Similar Sentences using a novel Fuzzy clustering algorithm framework is developed which organizes text from one or more documents into different clusters. The traditional fuzzy clustering approaches are not applicable to sentence clustering because most sentence similarity measures do not represent sentences in a common metric space. An enhanced Fuzzy clustering algorithm is applied in the sentence of datasets to group the related sentences. Page Rank algorithm highlights the more relevant inter clusters which interprets the Page-Rank score of an object. Expectation- Maximization (EM) framework has been developed in order to predict the overlapping clusters of semantically related sentences. Quotations dataset and News article dataset empirically implies the Similarity measure of matching Semantically Similar Sentences in which our system out performs the baseline method and projection methods. Our proposed method performs 34% higher in similarity scoring of related sentences. It also analyzes the clustering performance in terms of Entropy and Purity which yields more Purity and less Entropy. Our Experimental results demonstrates that our method is capable of identifying the overlapping clusters of semantically related sentences, and can be used in a variety of text mining tasks.

Full Text