Automated Prediction of Good Dictionary EXamples (GDEX): A Comprehensive Experiment with Distant Supervision, Machine Learning, and Word Embedding‐Based Deep Learning Techniques

Muhammad Yaseen Khan,Muhammad Suffian Nizami,Abdul Qayoom,Shaukat Wasi,Syed Muhammad Khaliq-Ur-Rahman Raazi,Muhammad Shoaib Siddiqui,Shahzad Sarfraz

doi:10.1155/2021/2553199

Muhammad Yaseen Khan, Muhammad Suffian Nizami + Show 5 more

Open Access

https://doi.org/10.1155/2021/2553199

Copy DOI

Abstract

Dictionaries not only are the source of getting meanings of the word but also serve the purpose of comprehending the context in which the words are used. For such purpose, we see a small sentence as an example for the very word in comprehensive book‐dictionaries and more recently in online dictionaries. The lexicographers perform a very meticulous activity for the elicitation of Good Dictionary EXamples (GDEX)—a sentence that is best fit in a dictionary for the word’s definition. The rules for the elicitation of GDEX are very strenuous and require a lot of time for committing the manual process. In this regard, this paper focuses on two major tasks, i.e., the development of labelled corpora for top 3K English words through the usage of distant supervision approach and devising a state‐of‐the‐art artificial intelligence‐based automated procedure for discriminating Good Dictionary EXamples from the bad ones. The proposed methodology involves a suite of five machine learning (ML) and five word embedding‐based deep learning (DL) architectures. A thorough analysis of the results shows that GDEX elicitation can be done by both ML and DL models; however, DL‐based models show a trivial improvement of 3.5% over the conventional ML models. We find that the random forests with parts‐of‐speech information and word2vec‐based bidirectional LSTM are the most optimal ML and DL combinations for automated GDEX elicitation; on the test set, these models, respectively, secured a balanced accuracy of 73% and 77%.

Highlights

Muhammad Yaseen Khan,1,2 Abdul Qayoom,1 Muhammad Suffian Nizami,3 Muhammad Shoaib Siddiqui,4 Shaukat Wasi,1 and Syed Muhammad Khaliq-ur-Rahman Raazi 1
This paper focuses on two major tasks, i.e., the development of labelled corpora for top 3K English words through the usage of distant supervision approach and devising a state-of-the-art artificial intelligence-based automated procedure for discriminating Good Dictionary EXamples from the bad ones. e proposed methodology involves a suite of five machine learning (ML) and five word embedding-based deep learning (DL) architectures
We find that the random forests with parts-of-speech information and word2vec-based bidirectional Long-Short Term Memory (LSTM) are the most optimal ML and DL combinations for automated Good Dictionary EXamples (GDEX) elicitation; on the test set, these models, respectively, secured a balanced accuracy of 73% and 77%

Summary

Literature Review

On the problem under study, there are many significant methodologies proposed by researchers; we maintain that, in comparison to other classification tasks in NLP, the amount of work for GDEX classification is small. E group used the web corpus of etTenTen; in their approach, they focus on the sentence length, word length, the number of subordinate clauses, and keyword position In another similar study, Uprety and Shakya [14] conducted a test to analyse the effectiveness of context clue sentences among Nepalese students. Where C is a dictionary with key-value pairs such as word w being the key, against whom a list of tuples is retained; further, the contents of the tuple shows the example sentence Swi along with its thumbs-up votes (Ui) and thumbs-down votes (Di); the subscript i indicates the index of sentence respectively. E dataset for every scoring function is balanced, i.e., each class contains 20K records (which alternatively means 40K sentences, in total, are used in the experiments.) One key observation we can get from the table is the average sentence length of good examples is approximately half of its counterclass. It further asserts that the distinct supervision (or nearly crowdsourced data) appeared to have aligned with rule#1 (i.e., already stated in Subsection 2.2)

Machine Learning-Based Classification

Result

TF-IDF Vectroization 3

Results and Discussion

Evaluation metrics

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Complexity	Publication Date: Jan 1, 2021
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automated Prediction of Good Dictionary EXamples (GDEX): A Comprehensive Experiment with Distant Supervision, Machine Learning, and Word Embedding‐Based Deep Learning Techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity

Lead the way for us

Similar Papers

Determination of Novel Estimations for the Slater Difference and Applications
Muhammad Adil Khan ... Hidayat Ullah
Complexity | VOL. -
Muhammad Adil Khan, et. al.Muhammad Adil Khan ... Hidayat Ullah
30 May 2024
Complexity | VOL. -

Design and Implementation of a Hybrid-Driven Soft Robot
Ke Zhang ... Ruiyu Zhang
Complexity | VOL. 2024
Ke Zhang, et. al.Ke Zhang ... Ruiyu Zhang
29 May 2024
Complexity | VOL. 2024

Stability Evaluation of Slope Based on Global Sensitivity Analysis
Zhaoxia Xu ... Xiuzhen Wang
Complexity | VOL. 2024
Zhaoxia Xu, et. al.Zhaoxia Xu ... Xiuzhen Wang
21 May 2024
Complexity | VOL. 2024

Utilizing the Optimal Auxiliary Function Method for the Approximation of a Nonlinear Long Wave System considering Caputo Fractional Order
Aaqib Iqbal ... Abdellatif Ben Makhlouf
Complexity | VOL. 2024
Aaqib Iqbal, et. al.Aaqib Iqbal ... Abdellatif Ben Makhlouf
20 May 2024
Complexity | VOL. 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Prediction of Good Dictionary EXamples (GDEX): A Comprehensive Experiment with Distant Supervision, Machine Learning, and Word Embedding‐Based Deep Learning Techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity