Abstract

Due to the increased amounts of online documents, there is a growing demand for text categorization that categorizes documents into predefined categories. Many approaches to this problem are based on supervised machine learning which couldn’t be applied to unlabeled data. However, large number of documents, such as online cell phone reviews, have no category information and key categories are not predefined. To solve these problems, we propose unsupervised document multi-labeling method based on word embedding and word network analysis. After embedding words in a lower dimensional space using Word2Vec technique, we generate a weight matrix by calculating similarities between words. We create a word network using this matrix and extract the key categories from this network. With key category-weight matrix and co-occurrence matrix, we generate a document-category score matrix. To verify our proposed method, we collect 298,206 cell phone reviews from four review websites. Then, we compared the results of the proposed method with labeled documents from human cognitive perspective.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.