Some Experiments on Clustering Similar Sentences of Texts in Portuguese

Eloize Rossi Marques Seno,Maria Das Graças Volpe Nunes

doi:10.1007/978-3-540-85980-2_14

Some Experiments on Clustering Similar Sentences of Texts in Portuguese

Eloize Rossi Marques Seno, Maria Das Graças Volpe Nunes

Open Access

PDF Available

https://doi.org/10.1007/978-3-540-85980-2_14

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2008

Citations: 10

Affiliation: Universidade de São Paulo

#Texts In Portuguese #Texts In Brazilian Portuguese #Important Role In Many Applications #Role In Many Applications #Paraphrase Generation #Applications In NLP #Unsupervised Clustering Method #Automatic Summarization #Brazilian Portuguese #Similarity Metrics

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Identifying similar text passages plays an important role in many applications in NLP, such as paraphrase generation, automatic summarization, etc. This paper presents some experiments on detecting and clustering similar sentences of texts in Brazilian Portuguese. We propose an evalution framework based on an incremental and unsupervised clustering method which is combined with statistical similarity metrics to measure the semantic distance between sentences. Experiments show that this method is robust even to treat small data sets. It has achieved 86% and 93% of F-measure and Purity, respectively, and 0.037 of Entropy for the best case.KeywordsSentence SimilaritySentence ClusteringStatistical Metrics

Full Text

Submitted Version (Free)

View/Download pdf

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.

R Discovery Prime

Some Experiments on Clustering Similar Sentences of Texts in Portuguese