Predicting students’ academic performance using a modified kNN algorithm

Moohanad Jawthari,Veronika Stoffová

doi:10.1556/606.2021.00374

Moohanad Jawthari, Veronika Stoffová

Open Access

https://doi.org/10.1556/606.2021.00374

Copy DOI

Abstract

Abstract The target (dependent) variable is often influenced not only by ratio scale variables, but also by qualitative (nominal scale) variables in classification analysis. Majority of machine learning techniques accept only numerical inputs. Hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. If the variable does not have relation or order between its values, assigning numbers will mislead the machine learning techniques. This paper presents a modified k-nearest-neighbors algorithm that calculates the distances values of categorical (nominal) variables without encoding them. A student’s academic performance dataset is used for testing the enhanced algorithm. It shows that the proposed algorithm outperforms standard one that needs nominal variables encoding to calculate the distance between the nominal variables. The results show the proposed algorithm preforms 14% better than standard one in accuracy, and it is not sensitive to outliers.

Highlights

Data understanding is an important step for accurate analysis
The Educational Data Mining (EDM) is an evolving discipline that deals with the creation of methods for exploring the specific and increasingly large-scale knowledge that comes from educational environments and using these methods to better understand students and the environments in which they learn [3, 4]
The k-Nearest Neighbors (kNN) is one of the most popular classification algorithms due to its simplicity [5]. It stores all available cases and classifies new cases based on a similarity measure. It classifies a new sample by a majority vote of its neighbors, with the case being assigned to the group most common amongst its k nearest neighbors kNN measured by a distance function

Summary

INTRODUCTION

Data understanding is an important step for accurate analysis. Data pre-processing is the first step needed to aid algorithms and to improve efficiency before proceeding to the actual analysis. Data variables generally fall into one of the four broad categories: nominal scale, ordinal scale, interval scale, and ratio scale [1]. Gender nominal variable in the datasets which take (male, female). Ratio scale possesses qualities of nominal, ordinal and interval scales, has absolute zero value. In addition to, it permits comparisons between different variables values. Assigning numerical values to nominal attributes misleads the machine learning algorithms learning by making difference or order between values that are not originally existed in the attributes and this phenomenon is called subjectivity. This research proposes two similarity measures for kNN algorithm to deal with categorical variables without converting them as numerical.

RELATED WORKS

Distance functions

PROPOSED KNN ALGORITHM

DATA SET

Data mining

RESULT

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Pollack Periodica	Publication Date: Sep 28, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Predicting students’ academic performance using a modified kNN algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Pollack Periodica

Lead the way for us

Similar Papers

Measurement of pain in children: state-of-the-art considerations.
Brenda C Mcclain
Anesthesiology | VOL. 96
Brenda C McclainBrenda C Mcclain
01 Mar 2002
Anesthesiology | VOL. 96

Type of independent variables in multivariable analysis
Mitchell H Katz
-
Mitchell H KatzMitchell H Katz
06 Feb 2006
06 Feb 2006

On Ordinal Prediction Problems
L S Mayer ... I J Good
Social forces; a scientific medium of social study and interpretation | VOL. 52
L S Mayer, et. al.L S Mayer ... I J Good
01 Jun 1974
Social forces; a scientific medium of social study and interpretation | VOL. 52

Analyses of household travel activities by information statistics
John L Neale
Transportation Research Part A-policy and Practice | VOL. 15
John L NealeJohn L Neale
01 Mar 1981
Transportation Research Part A-policy and Practice | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting students’ academic performance using a modified kNN algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Pollack Periodica