Bahasa Indonesia pre-trained word vector generation using word2vec for computer and information technology field

Syarifah K Putri,O S Sitompul,A Amalia,E B Nababan

doi:10.1088/1742-6596/1898/1/012007

Syarifah K Putri, O S Sitompul + Show 2 more

Open Access

https://doi.org/10.1088/1742-6596/1898/1/012007

Copy DOI

Abstract

Words embedding or distributed representations is a popular method for representing words. In this method, the resulting vector value is a set of real values with specific dimensions that are more effective than the Bag of Word (BoW) method. Also, the advantages of distributed representations can produce word vectors that contain semantic and syntactic information, so that word vectors with close meanings will have close word vectors. However, distributed representation requires a huge corpus with a long training time. For this reason, many researchers have created trained word vectors that can be used repeatedly. The problem is that the available trained word vectors are usually general domain word vectors. This study aims to form pre-trained word vectors for specific domains, namely computers and information technology. Researchers used a dataset of student scientific papers from the Universitas Sumatera Utara (USU) repository. Researchers used the word2vec model, where the model has two architectures, namely the Continuous Bag-of-Words (CBOW) and Skip-gram. This research’s result is word2vec model with the CBOW method is more effective than the Skip-gram method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Jun 1, 2021
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Bahasa Indonesia pre-trained word vector generation using word2vec for computer and information technology field

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Distributed Representation of Words in Vector Space for Kannada Language
Pandurang S Kambali ... Sanjana Suri
-
Pandurang S Kambali, et. al.Pandurang S Kambali ... Sanjana Suri
01 Dec 2018
01 Dec 2018

Xplore Word Embedding Using CBOW Model and Skip-Gram Model
Kailash Choudhary ... Ruby Beniwal
-
Kailash Choudhary, et. al.Kailash Choudhary ... Ruby Beniwal
25 Nov 2021
25 Nov 2021

Building WordNet for Afaan Oromoo

Computer Engineering and Intelligent Systems | VOL. 11

01 May 2020
Computer Engineering and Intelligent Systems | VOL. 11

Improving word vector model with part‐of‐speech and dependency grammar information
Chunhui Deng ... Gangming Lai
CAAI Transactions on Intelligence Technology | VOL. 5
Chunhui Deng, et. al.Chunhui Deng ... Gangming Lai
02 Nov 2020
CAAI Transactions on Intelligence Technology | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bahasa Indonesia pre-trained word vector generation using word2vec for computer and information technology field

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series