Word embedding dimensionality reduction using dynamic variance thresholding (DyVaT)

Avraham Treistman,Dror Mughaz,Ariel Stulman,Amit Dvir

doi:10.1016/j.eswa.2022.118157

Abstract

Natural language processing (NLP) provides a framework for large-scale text analysis. One common processing method uses vector space models (VSMs) which embed word attributes, called features, into highly dimensional vectors. Comprehensive VSMs are generated on sources such as the GoogleNews archive. A thesaurus, a collection of semantically-related words, can be created for a particular root word using cosine similarity with a given VSM. Many methods have been developed to reduce the complexity of these models by maintaining useful semantic information while discarding non-informative features. One such method, variance thresholding, retains high-variance features above a manually-determined threshold, providing higher differentiation between words for classification purposes. Our research developed a dimension-reducing methodology called dynamic variance thresholding (DyVaT). DyVaT reduces the specificity of word embeddings by maintaining low-variance features, allowing for a broader thesaurus preserving semantic similarity. A dynamic variance threshold, determining which low-variance features are retained, is selected using the kneedle algorithm, improving the current results. Our test case for examining the efficiency of DyVat in creating a contextual thesaurus is the visual, auditory and kinesthetic learning style context. We conclude that DyVaT is a valid method for generating loosely-connected word collections with potential uses in NLP classification or clustering tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Word embedding dimensionality reduction using dynamic variance thresholding (DyVaT)

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Jul 16, 2022
Citations: 4

Similar Papers

The Comparative Study of Visual and Auditory Learning Style on Jigsaw Strategy in Students’ Reading Comprehension at Junior High School
Heppy Naibaho ... Sondang Manik
IDEAS: Journal on English Language Teaching and Learning, Linguistics and Literature | VOL. 11
Heppy Naibaho, et. al.Heppy Naibaho ... Sondang Manik
31 Jul 2023
IDEAS: Journal on English Language Teaching and Learning, Linguistics and Literature | VOL. 11

Identifikasi Gaya Belajar Siswa dan Dampaknya terhadap Hasil Belajar: Analisis pada Tingkat Pendidikan Menengah Atas
Rusli Rusli ... Suwatno Suwatno
EDUKASIA: Jurnal Pendidikan dan Pembelajaran | VOL. 4
Rusli Rusli, et. al.Rusli Rusli ... Suwatno Suwatno
06 Jul 2023
EDUKASIA: Jurnal Pendidikan dan Pembelajaran | VOL. 4

The Comparative Study of Students Learning Styles Towards Their Ability in Grammar
Sukra Titin Sholeha ... Hilma Pami Putri
Education Achievement: Journal of Science and Research | VOL. -
Sukra Titin Sholeha, et. al.Sukra Titin Sholeha ... Hilma Pami Putri
21 May 2024
Education Achievement: Journal of Science and Research | VOL. -

Kemampuan Pemecahan Masalah Matematika ditinjau dari Gaya Belajar Siswa Kelas VIII SMP Negeri 6 Mataram Tahun Pelajaran 2020/2021
Baiq Yuspita Halilianti ... Nyoman Sridana
Griya Journal of Mathematics Education and Application | VOL. 2
Baiq Yuspita Halilianti, et. al.Baiq Yuspita Halilianti ... Nyoman Sridana
30 Jun 2022
Griya Journal of Mathematics Education and Application | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Word embedding dimensionality reduction using dynamic variance thresholding (DyVaT)

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications