Abstract
Aimed at the problem that the characteristics of news headlines are sparse and the semantic relationship between words and words is weak, which led to the difficulty of obtaining good results in traditional text classification methods, a short text based on word vector and WTTM model is proposed. A classification method that models the topic of short text from the co-occurrence of words. Firstly, the Word2Vec tool is used to train the word vector in the short text corpus and the average word vector is synthesized by the additive averaging method. Then the theme of the short text corpus is modeled by the WTTM model to obtain the topic extended feature vector. Finally, the average word vector and the topic extended feature vector are merged. And used a random forest model to construct a classifier for classification. Compared with other short text classification methods, the short text classification method based on word vector and WTTM model has improved accuracy, recall rate and F1 value, which verified the feasibility of the method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.