BnVec: Towards the Development of Word Embedding for Bangla Language Processing

Md Kowsher,Anik Tahabilder,K M Rashedul Alam,Md Jashim Uddin,Mahid Ahmed,Tamanna Sultana,Nusrat Jahan Prottasha

doi:10.14419/ijet.v10i2.31538

Md Kowsher, Anik Tahabilder + Show 5 more

Open Access

https://doi.org/10.14419/ijet.v10i2.31538

Copy DOI

Abstract

Progression in machine learning and statistical inference are facilitating the advancement of domains like computer vision, natural language processing (NLP), automation & robotics, and so on. Among the different persuasive improvements in NLP, word embedding is one of the most used and revolutionary techniques. In this paper, we manifest an open-source library for Bangla word extraction systems named BnVec which expects to furnish the Bangla NLP research community by the utilization of some incredible word embedding techniques. The BnVec is splitted up into two parts, the first one is the Bangla suitable defined class to embed words with access to the six most popular word embedding schemes (CountVectorizer, TF-IDF, Hash Vectorizer, Word2vec, fastText, and Glove). The other one is based on the pre-trained distributed word embedding system of Word2vec, fastText, and GloVe. The pre-trained models have been built by collecting content from the newspaper, social media, and Bangla wiki articles. The total number of tokens used to build the models exceeds 395,289,960. The paper additionally depicts the performance of these models by various hyper-parameter tuning and then analyzes the results.

Highlights

Word embedding refers to the vector representation of linguistic or phonetic information
It is one of the most popular document representation models that is being comprehensively used in multiple domains of Natural language processing (NLP) application including named entity recognition [7], sentiment analysis, part of speech tagging, and so forth [20]
We have presented two methodologies for Bengali word embedding

Summary

Introduction

Word embedding refers to the vector representation of linguistic or phonetic information. It is one of the most popular document representation models that is being comprehensively used in multiple domains of Natural language processing (NLP) application including named entity recognition [7], sentiment analysis, part of speech tagging, and so forth [20]. Changing over data into lower-dimensional vectors is the primary objective of this research Throughout these years various strategies for word clustering or embedding have been presented. We tried three separate techniques to choose the strategy that works best for Bangla word-embedding. These days word vector portrayal has been the most widely recognized strategy for word group creation.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Engineering & Technology	Publication Date: May 24, 2021
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

BnVec: Towards the Development of Word Embedding for Bangla Language Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Engineering & Technology

Lead the way for us

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP
Abu Bakr Soliman ... Samhaa R El-Beltagy
Procedia Computer Science | VOL. 117
Abu Bakr Soliman, et. al.Abu Bakr Soliman ... Samhaa R El-Beltagy
01 Jan 2017
Procedia Computer Science | VOL. 117

An Enhanced Neural Word Embedding Model for Transfer Learning
Md Kowsher ... Nusrat Jahan Prottasha
Applied Sciences | VOL. 12
Md Kowsher, et. al.Md Kowsher ... Nusrat Jahan Prottasha
10 Mar 2022
Applied Sciences | VOL. 12

Guest Editors Introduction: Machine Learning in Speech and Language Technologies
Pascale Fung ... Dan Roth
Machine Learning | VOL. 60
Pascale Fung, et. al.Pascale Fung ... Dan Roth
01 Sep 2005
Machine Learning | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BnVec: Towards the Development of Word Embedding for Bangla Language Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Engineering &amp; Technology

More From: International Journal of Engineering & Technology