Application of LDA and word2vec to detect English off-topic composition.

Yilan Qi,Jun He,Seyedali Mirjalili

doi:10.1371/journal.pone.0264552

Yilan Qi, Jun He + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0264552

Copy DOI

Abstract

This paper presents an off-topic detection algorithm combining LDA and word2vec aiming at the problem of the lack of accurate and efficient off-topic detection algorithms in the English composition-assisted review system. The algorithm uses the LDA model to model the document and train the document through the word2vec, and uses the semantic relationship between the document's topics and words to calculate the probability weighted sum for each topic and its feature words in the document, and finally selects the off-topic composition by setting a reasonable threshold. Different F values are obtained by changing the number of topics in the document, and the best number of topics is determined. Experimental results show that the proposed method is more effective than vector space model, can detect more off-topic compositions, and the accuracy is higher, the F value is more than 88%, which realizes the intelligent processing of off-topic detection of composition, and can be effectively applied in English composition teaching.

Highlights

The most commonly used and classic text representation model is the vector space model, and the TF-IDF algorithm based on the vector space model is the most widely used method to calculate the text similarity
Among them: because zi represents the subject variable corresponding to the ith word; ┐ i means that the ith word is not included, so z┐ i represents the probability distribution of all topics zk (k61⁄4i); zðtÞ k;┐i indicates that the feature word t belongs to the word frequency of topic k; zðkÞ m;┐i represents the size of the feature word set assigned to the topic k by the document m
The off-topic composition detected by the experimental results is compared with the off-topic composition graded manually, and a comprehensive evaluation and analysis is carried out from the accuracy rate, the recall rate and the F value to verify the effectiveness and practicability of the algorithm in the experiment

Summary

Introduction

Composition is an important means to express emotion and transmit information, while the theme is the soul of composition. The most commonly used and classic text representation model is the vector space model, and the TF-IDF algorithm based on the vector space model is the most widely used method to calculate the text similarity This method manifests the weight of the word by the frequency of the word appearing in the document and the frequency of the word appearing in the document collection. The English words "like" and "love", for example, they all mean like, but in the vector space model, they are treated as two separate lexical items For this disadvantage, some researchers have proposed methods of word extension, such as using dictionaries Word Net, How-Net for word extension. A new method of text similarity calculation is proposed for the deficiency of the above methods, and it is used to test the off-topic of English composition. Compared with vector space model-based method, how is the effectiveness of the off-topic detection method based on LDA and word2vec?

LDA model

Gibbs sampling

LDA modeling process

Topic correlation calculation based on LDA and word2vec

Word2vec

Calculation of subject correlation

Off-topic detection algorithm

Experimental results and comparative analysis

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Feb 25, 2022
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Application of LDA and word2vec to detect English off-topic composition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Application of LDA and word2vec to detect English off-topic composition
Balachandran Krishnan ... Yilan Qi
-
Balachandran Krishnan, et. al.Balachandran Krishnan ... Yilan Qi
25 Feb 2022
25 Feb 2022

A method of detecting run-on essays based on the degree of tangency
Yahong Hu ... Pengjie Liu
-
Yahong Hu, et. al.Yahong Hu ... Pengjie Liu
15 Apr 2022
15 Apr 2022

Automated detection of airfield pavement damages: an efficient light-weight algorithm
Hongren Gong ... Shifu Liu
International Journal of Pavement Engineering | VOL. 24
Hongren Gong, et. al.Hongren Gong ... Shifu Liu
19 Aug 2023
International Journal of Pavement Engineering | VOL. 24

An efficient 3D DEM-FEM contact detection algorithm for tire-sand interaction
Peng Yang ... Mengyan Zang
Powder Technology | VOL. 360
Peng Yang, et. al.Peng Yang ... Mengyan Zang
22 Oct 2019
Powder Technology | VOL. 360

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of LDA and word2vec to detect English off-topic composition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one