Keratoconus Severity Classification Using Features Selection and Machine Learning Algorithms.

Mustapha Aatila,Mohamed Lachgar,Ali Kartit,Hrimech Hamid,Andrzej Kloczkowski

doi:10.1155/2021/9979560

Abstract

Keratoconus is a noninflammatory disease characterized by thinning and bulging of the cornea, generally appearing during adolescence and slowly progressing, causing vision impairment. However, the detection of keratoconus remains difficult in the early stages of the disease because the patient does not feel any pain. Therefore, the development of a method for detecting this disease based on machine and deep learning methods is necessary for early detection in order to provide the appropriate treatment as early as possible to patients. Thus, the objective of this work is to determine the most relevant parameters with respect to the different classifiers used for keratoconus classification based on the keratoconus dataset of Harvard Dataverse. A total of 446 parameters are analyzed out of 3162 observations by 11 different feature selection algorithms. Obtained results showed that sequential forward selection (SFS) method provided a subset of 10 most relevant variables, thus, generating the highest classification performance by the application of random forest (RF) classifier, with an accuracy of 98% and 95% considering 2 and 4 keratoconus classes, respectively. Found classification accuracy applying RF classifier on the selected variables using SFS method achieves the accuracy obtained using all features of the original dataset.

Highlights

In many fields, the resolution of most problems is based on the processing of data extracted from data acquired in the real world and structured in the form of vectors [1]
The classification algorithm comparison based on the classification accuracy of different models associated to the Algorithm (a) 0.96 0.94 0.92 0.90 0.88 0.86 0.84 0.82 logistic regression (LR) linear discriminant analysis (LDA) K-nearest neighbors (KNN) CART Naive Bayes (NB) support vector machine (SVM) random forest (RF)
The results provided by the previous simulations show that the random forest algorithm represents the highest performance compared to other algorithms, both with and without features selection

Summary

Introduction

In many fields (computer vision, pattern recognition, ..., etc.), the resolution of most problems is based on the processing of data extracted from data acquired in the real world and structured in the form of vectors [1]. In many cases, the resolution of the problem becomes almost impossible because of the very large dimension of these vectors. It is often useful, and sometimes necessary, to proceed to a selection of the most relevant features compared to the used resolution method, by eliminating harmful features to the adopted system, even if this selection of variables may lead to a slight loss of information. Learning is done quickly, and the complexity of the model will be reduced, making it easier to understand and improving metric performance in terms of precision, accuracy, and recall [3]

Objectives

Methods

Results

Discussion

Conclusion