Continuous-bag-of-words and Skip-gram for word vector training and text classification

Haowen Xia

doi:10.1088/1742-6596/2634/1/012052

Abstract

Natural language processing is one of the most challenging parts in the study of artificial intelligence and is widely used in real-life applications. One of the basic questions is how to calculate the probability of a particular text sequence appearing in a certain context. Word2Vec is a powerful tool that provides a solution to the question for its ability to transform words into word vectors, and to train in high efficiency on large datasets and corpora. It has many models of which Continuous-Bag-Of-Words and Skip-gram are of great significance and also known to many people. Furthermore, some extended techniques related to the models are also proposed in order to simultaneously decrease required training time and increase the rate of accuracy for the training. Even though there are now a number of papers that describe these fundamental concepts, the quality vary greatly. To better understand the models and their extensions, and how well they behave when used for real tasks, different combinations of the models and techniques are made in this paper so as to compare their performance in processing large input data and the ability for correct prediction in the task of text classification. This is done as it could lead to more provision of details and understandings of the model for subsequent researches on this field of study.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Nov 1, 2023
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Continuous-bag-of-words and Skip-gram for word vector training and text classification

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Several alternative term weighting methods for text representation and classification
Zhong Tang ... Song Li
Knowledge-Based Systems | VOL. 207
Zhong Tang, et. al.Zhong Tang ... Song Li
14 Aug 2020
Knowledge-Based Systems | VOL. 207

Exploring Recent NLP Advances for Tamil: Word Vectors and Hybrid Deep Learning Architectures
Archchitha Aravinthan ... Charles Eugene
International Journal on Advances in ICT for Emerging Regions (ICTer) | VOL. 17
Archchitha Aravinthan, et. al.Archchitha Aravinthan ... Charles Eugene
09 Oct 2024
International Journal on Advances in ICT for Emerging Regions (ICTer) | VOL. 17

DCCL: Dual-channel hybrid neural network combined with self-attention for text classification.
Chaofan Li ... Kai Ma
Mathematical biosciences and engineering : MBE | VOL. 20
Chaofan Li, et. al.Chaofan Li ... Kai Ma
01 Jan 2021
Mathematical biosciences and engineering : MBE | VOL. 20

An empirical evaluation of text representation schemes to filter the social media stream
Sandip Modha ... Thomas Mandl
Journal of Experimental & Theoretical Artificial Intelligence | VOL. 34
Sandip Modha, et. al.Sandip Modha ... Thomas Mandl
24 Apr 2021
Journal of Experimental & Theoretical Artificial Intelligence | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Continuous-bag-of-words and Skip-gram for word vector training and text classification

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series