Predicting math test scores using K-Nearest Neighbor

Jessica Maikhanh Brown

doi:10.1109/isecon.2017.7910221

Abstract

It is hard to predict student test scores in Mathematics. By being able to predict test scores students that will struggle may be identified. These students could be given more attention. This research uses the K-Nearest Neighbor (KNN) algorithm to predict the categorization of Mathematics test scores. The KNN algorithm initiates with a training data set and a value for parameter K. When evaluating a test record it compares the Euclidean distance between the test record and each of the training records. It examines the K training records closest to the test record. KNN's predicted category is the most common category of these K records. The data used in this research comes from a dataset of 395 records of Portuguese students. Each student record contained 30 data elements about the student including gender, age, travel time to school, if they were dating and the number of days they were absent. Portuguese students are given 3 Mathematics tests that have a score ranging from 0 to 20. To categorize students this research averaged the 3 test scores and created 4 categories. Students scoring 0 to 5, 5 to 10, 10 to 15 and 15 to 20 were categorized as low, slightly below average, slightly above average and high respectively. Half of the records in the dataset were used as training records. The other half was used as test records. Values of K equal to 1 through 20 were evaluated. KNN did very well; it predicted the category correctly 48.78% of the time. The best value was when K = 8. The worst value was K = 1. This research can be helpful in identifying students that might need assistance. It can also be used in identifying students for advance classes and students to be mentors. Results show that KNN can be a useful tool in predicting student test scores in Mathematics.

Full Text