Abstract
In this paper, we studied different automatic short answer grading (ASAG) systems to provide a comprehensive view of the feature spaces explored by previous works. While the performance reported in previous works have been encouraging, systematic study of the features is lacking. Apart from providing systematic feature space exploration, we also presented ensemble methods that have been experimentally validated to exhibit significantly higher grading performance over the existing papers in almost all the datasets in ASAG domain. A comparative study over different features and regression models toward short-answer grading has been performed with respect to evaluation metrics used in evaluating ASAG. Apart from traditional text similarity based features like WordNet similarity, latent semantic analysis, and others, we have introduced novel features like topic models suited for short text, relevance feedback based features. An ensemble-based model has been built using a combination of different regression models with an approach based on stacked regression. The proposed ASAG has been tested on the University of North Texas dataset for the regression task, whereas in case of classification task, the student response analysis (SRA) based ScientsBank and Beetle corpus have been used for evaluation. The grading performance in case of ensemble-based ASAG is highly boosted from that exhibited by an individual regression model. Extensive experimentation has revealed that feature selection, introduction of novel features, and regressor stacking have been instrumental in achieving considerable improvement in performance over the existing methods in ASAG domain.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have