Along with the development of distance education, emerges the demand for virtual environments as the automated evaluation studies of essays that has already produced promising results. However, when dealing with short answers, replicating the decisions of a human grader is still a challenge, as the portability of essay evaluation techniques to short answers has not produced results with the same level of accuracy. In this sense, the present paper aims to foster the development of studies in the field of automated evaluation of short discursive answers. The related works presented three main approaches: text-to-text similarity, knowledge-based similarity that rely on synonym dictionary and corpus-based similarity that rely on a related corpus. The present study has employed an n-gram based similarity and a categorization process applied to three sets of answers to questions in Portuguese language: two of them (Biology and Geography) obtained from an admission process to higher education and the third (Philosophy) from a virtual learning environment. The employed method was comprised of a five-stage pipeline architecture: corpus selection, preprocessing, variable generation, classification and accuracy validation. In these three corpora, several similarity measurements and distances resulting from the unigrams/bigrams combination were explored. During the classification stage, two methods were used: multiple linear regression and K-Nearest Neighbors (KNN). At the same time some research questions were revised leading to meaningful findings. As for the system efficiency regarding the Biology corpus, the accuracy was 84.01 system vs. human compared to 93.85 human vs. human; for the Geography corpus, the accuracy was 86.29 system vs. human compared to 84.93 human vs. human; and for the Philosophy corpus, findings revealed 81.59 accuracy system vs. human. These results, when compared with those obtained from recent experiments produced by other techniques indicate advantages in terms of a simpler method added to good accuracy.
Read full abstract