Abstract
Automatic grading requires the adaption of the latest technologies. It has become essential especially when most of the courses became online courses (MOOCs). The objectives of the current work are (1) Reviewing the literature on the text semantic similarity and automatic exam correction systems, (2) Proposing an automatic exam correction framework (HMB-AECF) for MCQs, essays, and equations that is abstracted into five layers, (3) Suggesting equations similarity checker algorithm named “HMB-MMS-EMA”, (4) Presenting an expression matching dataset named “HMB-EMD-v1”, (5) Comparing the different approaches to convert textual data into numerical data (Word2Vec, FastText, Glove, and Universal Sentence Encoder (USE)) using three well-known Python packages (Gensim, SpaCy, and NLTK), and (6) Comparing the proposed equations similarity checker algorithm (HMB-MMS-EMA) with a Python package (SymPy) on the proposed dataset (HMB-EMD-v1). Eight experiments were performed on the Quora Questions Pairs and the UNT Computer Science Short Answer datasets. The best-achieved highest accuracy in the first four experiments was 77.95% without fine-tuning the pre-trained models by the USE. The best-achieved lowest root mean square error (RMSE) in the second four experiments was 1.09 without fine-tuning the used pre-trained models by the USE. The proposed equations similarity checker algorithm (HMB-MMS-EMA) reported 100% accuracy over the SymPy Python package which reported 71.33% only on “HMB-EMD-v1”.
Highlights
Automatic grading is an approach that requires the adaption of the latest technologies
Word2Vec is a shallow two-layered neural network to perform word embeddings. It groups the vector of similar words in the vector space and it can be used to compare the whole document as it takes the advantage of individual token vectors
InferSent is a supervised sentence embedding technique. It is trained on the Stanford Natural Language Inference (SNLI) dataset
Summary
Automatic grading is an approach that requires the adaption of the latest technologies. Word2Vec is a shallow two-layered neural network to perform word embeddings It groups the vector of similar words in the vector space and it can be used to compare the whole document as it takes the advantage of individual token vectors. It is trained on the Stanford Natural Language Inference (SNLI) dataset It contains 570K human-generated English sentence pairs and uses the GloVe vectors for the pre-trained word embedding. NLTK is a leading platform for building different Python applications to work with human language data It provides easy-to-use interfaces and lexical resources such as WordNet. It provides easy-to-use interfaces and lexical resources such as WordNet It has a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning [45]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.