Document representation and clustering models for bilingual documents clustering

Shutian Ma,Chengzhi Zhang

doi:10.1002/pra2.2017.14505401056

Shutian Ma, Chengzhi Zhang

https://doi.org/10.1002/pra2.2017.14505401056

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

ABSTRACTCurrently, the internet has created many documents in languages other than English. People face challenges when seeking and using information; for example, non‐native English‐speaking students tend to have problems when utilizing libraries in North American universities. To help people efficiently organize information, bilingual documents clustering has advantages for practical utilization, it can divide documents into groups with the same topic and there is no need for a training dataset. Document representation and clustering models are two important parts in clustering. This paper compares four popular representation methods, vector space model (VSM), latent semantic indexing (LSI), latent Dirichlet allocation (LDA) and doc2vec (D2V), together with four different types of clustering algorithms, K‐means++, BIRCH, DBSCAN and affinity propagation (AP) to identify appropriate combinations for bilingual documents clustering. Parallel corpus and comparable corpus are all used for the bilingual datasets. Experimental results show that, clustering performance varies when combining different representation methods with clustering algorithms. It's important to make good choice of models for better documents organization.

Full Text

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Document representation and clustering models for bilingual documents clustering

Abstract

Published Version

Talk to us

Similar Papers

More From: Proceedings of the Association for Information Science and Technology

Lead the way for us

Journal: Proceedings of the Association for Information Science and Technology	Publication Date: Jan 1, 2017
Citations: 3

Similar Papers

A Similarity Rough Set Model for Document Representation and Document Clustering
Nguyen Chi Thanh ... Muneyuki Unehara
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 15
Nguyen Chi Thanh, et. al.Nguyen Chi Thanh ... Muneyuki Unehara
20 Mar 2011
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 15

Document Representation and Query Expansion Models for Blog Recommendation
Jaime Arguello ... Jamie Callan
Proceedings of the International AAAI Conference on Web and Social Media | VOL. 2
Jaime Arguello, et. al.Jaime Arguello ... Jamie Callan
25 Sep 2021
Proceedings of the International AAAI Conference on Web and Social Media | VOL. 2

Phrase based Clustering Scheme of Suffix Tree Document Clustering Model
Anoop Kumarjain ... Satyam Maheshwari
International Journal of Computer Applications | VOL. 63
Anoop Kumarjain, et. al.Anoop Kumarjain ... Satyam Maheshwari
15 Feb 2013
International Journal of Computer Applications | VOL. 63

A novel model for Document Representation
Asmaa Mountassir
-
Asmaa MountassirAsmaa Mountassir
01 May 2013
01 May 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Document representation and clustering models for bilingual documents clustering

Abstract

Published Version

Talk to us

Similar Papers

More From: Proceedings of the Association for Information Science and Technology