Enactment of tf-idf and word2vec on Text Categorization

Monika Arora,Vrinda Mittal,Priyanka Aggarwal

doi:10.1007/978-981-15-9712-1_17

Abstract

Text categorization has a variety of applications, such as sentiment analysis of user’s tweet, categorizing blog posts into different categories, etc. The real-time data available for categorization is usually unstructured. An efficient algorithm for preprocessing the data can help to achieve better accuracy. Term frequency–inverse document frequency (tf-idf) and word2vec word embedding techniques are used widely before applying the text classification model. In order to show the enactment of these techniques on text categorization, we are comparing the accuracies of different multi-class text categorization algorithms such as Support Vector Machine (SVM), Logistic Regression and K-Nearest Neighbor (KNN) on these techniques. TagMyNews dataset is used to train the model. The results indicate that word2vec is efficient word embedding technique as it possesses higher accuracies for all the classification methods (KNN: 79.38%, SVM: 93.59%, Logistic Regression: 87.46%) as compared to tf-idf (KNN: 73.37%, SVM: 84%, Logistic Regression: 73.98%).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enactment of tf-idf and word2vec on Text Categorization

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Image Classification of Tourist Attractions with K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine
Herry Sujaini
International Journal on Advanced Science, Engineering and Information Technology | VOL. 10
Herry SujainiHerry Sujaini
15 Dec 2020
International Journal on Advanced Science, Engineering and Information Technology | VOL. 10

MSVM-kNN: Combining SVM and k-NN for Multi-class Text Classification
Pingpeng Yuan ... Yuqin Chen
-
Pingpeng Yuan, et. al.Pingpeng Yuan ... Yuqin Chen
01 Jul 2008
01 Jul 2008

Variance based classifier comparison in text catergorization (poster session)
Atsuhiro Takasu ... Kenro Aihara
-
Atsuhiro Takasu, et. al.Atsuhiro Takasu ... Kenro Aihara
01 Jul 2000
01 Jul 2000

Prediction of postoperative complications after oesophagectomy using machine-learning methods.
Jin-On Jung ... Wolfgang Schröder
British Journal of Surgery | VOL. 110
Jin-On Jung, et. al.Jin-On Jung ... Wolfgang Schröder
21 Jun 2023
British Journal of Surgery | VOL. 110

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enactment of tf-idf and word2vec on Text Categorization

Abstract

Talk to us

Similar Papers