Difficulty Level Identification of Indonesian and Mathematics Multiple Choice Questions using Machine Learning Approach

Ade Romadhony,Shabrina Retno Ningsih

doi:10.47065/bits.v5i1.3649

Ade Romadhony, Shabrina Retno Ningsih

Open Access

https://doi.org/10.47065/bits.v5i1.3649

Copy DOI

Abstract

Examination question design is an important factor that could improve education, which could help teachers to analyze student understandings. Designing question should consider difficulty level, which commonly classified into three types: easy, medium, difficult. Predicting the difficulty level of questions is very important to help teachers form questions and know the level of student ability. In this study, we tackle question difficulty level identification as a classification problem. We use a dataset of Indonesian and mathematic question from elementary and junior or school exercise questions set and employ several machine learning methods on classification. We use Random Forest, Logistic Regression, SVM, Gaussian, and Dense NN on the experiment, with embeddings, lexical, and syntactic feature. The evaluation result shows that the best method on identifying question difficult level on Indonesian subject is Random Forest with 83% accuracy, while on mathematic subject the best method is Random Forest with 83% accuracy. Result analysis shows that embedding feature affect the model accuracy.

Full Text