Abstract
Because of the sparse text, the traditional text classification method is difficult to achieve good results in short text classification. In this paper, we design a short text classification method based on word vector and LDA topic model is proposed which considers the factors of Grammatical Category-combined Weight and the Topic High-frequency Word. In this method, Gibbs sampling is used to train LDA topic model on the basis of part of speech weight. The training results are trained by Wor2vec training word vectors, and vectorized with the Topic High Frequency Word. Then feature extend the test text. After expanding the features, the SVM algorithm is used to classify the extended short texts, and the classification results are evaluated by using the precision, F1-score, and recall. The results show that this method can significantly improve classification performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.