Skin disease prediction using ensemble methods and a new hybrid feature selection technique

Anurag Kumar Verma,B B Tiwari,Saurabh Pal

doi:10.1007/s42044-020-00058-y

Abstract

Now-a-days Skin disease is very common worldwide problem. We have preset this study for the prediction of skin disease. Based on data from UCI data set, there are 34 attributes which plays a vital role in the skin disease diagnosis but all are not important. In this paper we have analyzed only those important attributes which give best accuracy in prediction of skin disease. To select important attributes, we have applied a new hybrid approach using three feature extraction techniques Chi Square, Information Gain and Principle Component Analysis (PCA) and then combining them to select the best possible data subset of skin disease data set. Six base learners Gaussian Naive Bayesian (NB), K-Nearest Neighbour (KNN), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF) and Multilayer Perceptron (MLP) are used to evaluate the prediction performance of base learners. Boosting, Bagging and Stacking ensemble techniques are applied on base learners to enhance the results of the proposed model. In this paper, a new proposed method of hybrid feature selection technique is used for evaluating the performance of base learners and we find that reduced data subset performed is higher as compared to whole data set. The metrics are necessary to evaluate the model and calculated to illustrate the performance of prediction. Hybrid feature selection technique along with base learners are then applied on Bagging, Boosting and Staking ensemble techniques to enhance the results. These results are compared with individual base learners. The result obtained in this research paper is higher than previous studies.

Full Text