Abstract

The automatic scoring system of business English essay has been widely used in the field of education, and it is indispensable for the task of off-topic detection of essay. Most of the traditional off-topic detection methods convert text into vector representation of vector space and then calculate the similarity between the text and the correct text to get the off-topic result. However, those methods only focus on the structure of the text, but ignore the semantic association. In addition, the traditional detection method has a low off-topic detection effect for essays with high divergence. In view of the above problems, this paper proposes an off-topic detection method for business English essay based on the deep learning model. Firstly, the word2vec model is used to represent words in sentences as word vectors. And, LDA is used to extract the vector of topic and text, respectively. Then, word vector and topic word vector are spliced together as the input of the convolutional neural network (CNN). CNN is used to extract and screen the features of sentences and perform similarity calculation. When the similarity is less than the threshold, the paper also maps the topic and the subject words in the coupling space and calculates their relevance. Finally, unsupervised off-topic detection is realized by the clustering method. The experimental results show that the off-topic detection method based on the deep learning model can improve the detection accuracy of both the essays with low divergence and the essays with high divergence to a certain extent, especially the essays with high divergence.

Highlights

  • Automatic composition scoring is a process of autonomously participating in composition scoring using computer-related technologies [1], such as natural language processing and machine learning

  • En, the score for such a composition is naturally not high according to the actual situation. erefore, off-topic detection of composition is the first “checkpoint” of automatic composition scoring, which conforms to the rules and conventions of manual scoring

  • Literature [6] proposed a method based on the convolutional neural network (CNN), which can make better use of the semantic information contained in the report text to speed up the retrieval process. e proposed method uses the graph embedding method to enhance the word representation by capturing the semantic relationship information from the medical ontology

Read more

Summary

Introduction

Automatic composition scoring is a process of autonomously participating in composition scoring using computer-related technologies [1], such as natural language processing and machine learning. Literature [5] proposes two methods to better compare the semantic similarity between the article to be tested and the article. Literature [6] proposed a method based on the convolutional neural network (CNN), which can make better use of the semantic information contained in the report text to speed up the retrieval process. The CNN model is adjusted to calculate the similarity between the report pairs to determine the target report-paired with overlapping body parts Experiments show that this method can realize the semantic similarity detection of medical texts. In response to the abovementioned problems, this paper proposes a method of off-topic detection of business English composition based on the deep learning model. Experiments show that this method can improve the detection effect of compositions with high topic divergence

The Related Work
Methods
Article Similarity Calculation of Depth Model
Analysis of Experimental Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.