Abstract

Word embeddings are widely used in natural language processing for mapping words into a numerical representation in vector space. Their quality can be influenced by a variety of factors such as training methods and corpus, which in turn impact machine learning performance. As a whole, larger corpora result in higher-quality word embeddings and improved classification accuracy when the training method is same and the corpus is different. However, the content of the corpus will also affect the classification performance. In this work, we study the relationship between several common word embeddings and sentiment classification models through a series of comparative experiments. Comparison results reveal that in addition to the training method and corpus size, the corpus content and dimensionality also play a significant role in determining the quality of word embeddings. Therefore, when dealing with specific tasks, it is necessary to comprehensively consider these factors, so as to obtain better results. This work provides an improved understanding of factors for consideration that may lead to more efficient sentiment classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.