Investigating neural architectures for short answer scoring

Brian Riordan,Aoife Cahill,Torsten Zesch,Chong Min Lee,Andrea Horbach

doi:10.18653/v1/w17-5017

Abstract

Neural approaches to automated essay scoring have recently shown state-of-the-art performance. The automated essay scoring task typically involves a broad notion of writing quality that encompasses content, grammar, organization, and conventions. This differs from the short answer content scoring task, which focuses on content accuracy. The inputs to neural essay scoring models – ngrams and embeddings – are arguably well-suited to evaluate content in short answer scoring tasks. We investigate how several basic neural approaches similar to those used for automated essay scoring perform on short answer scoring. We show that neural architectures can outperform a strong non-neural baseline, but performance and optimal parameter settings vary across the more diverse types of prompts typical of short answer scoring.

Highlights

Deep neural network approaches have recently been successfully developed for several educational applications, including automated essay assessment
7Ramachandran et al (2015) state that their mean quadratic weighted kappa (QWK) is 0.0053 higher than the Tandalla result, so in Table 4 we report that score truncated to 3 decimal places rather than the rounded result reported in Ramachandran et al (2015)
Our results establish that the basic neural architecture of pretrained embeddings with tuning across model training and LSTMs is a reasonably effective architecture for the short answer content scoring task

Summary

Introduction

Deep neural network approaches have recently been successfully developed for several educational applications, including automated essay assessment. To explore the effectiveness of neural network architectures on SAS, we use the basic architecture and parameters of Taghipour and Ng (2016) on three publicly available short answer datasets: ASAP-SAS (Shermis, 2015), Powergrading (Basu et al, 2013), and SRA (Dzikovska et al, 2016, 2013). While these datasets differ with respect to the length and complexity of student responses, all prompts in the datasets focus on content accuracy. We explore how well the optimal parameters for AES from Taghipour and Ng (2016) fare on these datasets, and whether different architectures and parameters perform better on the SAS task

ASAP-SAS

Powergrading

Method

Baseline

Neural networks

Parameter exploration results

Test performance

Findings

Discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Investigating neural architectures for short answer scoring

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2017
Citations: 107	License type: cc-by

Similar Papers

A Survey of Text Similarity Approaches
Wael H.Gomaa ... Aly A Fahmy
International Journal of Computer Applications | VOL. 68
Wael H.Gomaa, et. al.Wael H.Gomaa ... Aly A Fahmy
18 Apr 2013
International Journal of Computer Applications | VOL. 68

Using Prizes to Facilitate Change in Educational Assessment
Mark D Shermis ... Jaison Morgan
-
Mark D Shermis, et. al.Mark D Shermis ... Jaison Morgan
20 Aug 2015
20 Aug 2015

A Hierarchical BERT-Based Transfer Learning Approach for Multi-Dimensional Essay Scoring
Jin Xue ... Xiaoyi Tang
IEEE Access | VOL. 9
Jin Xue, et. al.Jin Xue ... Xiaoyi Tang
01 Jan 2020
IEEE Access | VOL. 9

Hybrid Deep Neural Networks for Industrial Text Scoring
Sidharrth Nagappan ... Amy Hui-Lan Lim
-
Sidharrth Nagappan, et. al.Sidharrth Nagappan ... Amy Hui-Lan Lim
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Investigating neural architectures for short answer scoring

Abstract

Highlights

Summary

Talk to us

Similar Papers