Abstract

In this paper, a Non-negative Matrix Factorization Feature Expansion (NMFFE) approach was proposed to overcome the feature-sparsity issue when expanding features of short-text. Firstly, we took the internal relationships of short texts and words into account when segmenting words from texts and constructing their relationship matrix. Secondly, we utilized Dual regularization Non-negative Matrix Tri-Factorization algorithm (DNMTF) to obtain the words clustering indicator matrix, which was used to get the feature space by dimensionality reduction methods. Thirdly, words with close relationship were selected out from the feature space and added into the short-text in order to solve the sparsity issue. The experimental results showed that the accuracy of short text classification of our NMFFE algorithm increased 25.77%, 10.89% and 1.79% on three datasets: Web snippets, Twitter sports and AGnews respectively compared with Word2Vec algorithm and Char-CNN algorithm. It indicated that the NMFFE algorithm was better than BOW algorithm and the Char-CNN algorithm in terms of classification accuracy and algorithm robustness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.