Abstract

Natural language processing has been the subject of numerous studies in the last decade. These have focused on the various stages of text processing, from text preparation to vectorization to final text comprehension. The goal of vector space modeling is to project words in a language corpus into a vector space in such a way that words that are similar in meaning are close to each other. Currently, there are two commonly used approaches to the topic of vectorization. The first focuses on creating word vectors taking into account the entire linguistic context, while the second focuses on creating document vectors in the context of the linguistic corpus of the analyzed texts. The paper presents the comparison of different existing text vectorization methods in natural language processing, especially in Text Mining. The comparison of text vectorization methods is possible by checking the accuracy of classification; we used the methods NBC and k-NN, as they are some of the simplest methods. They were used for the classification in order to avoid the influence of the choice of the method itself on the final result. The conducted experiments provide a basis for further research for better automatic text analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.