Dimensionality Reduction Based Component Discriminant Factor Implication for Mushroom Edibility Classification Using Machine Learning

M Shyamala Devi,Vasujit Bhattacharjee,Akshita Rai,Sudheer Kumar Gupta,Santhosh Veeraraghavan Ramesh,A Peter Soosai Anandaraj

doi:10.1007/978-981-16-5529-6_1

Abstract

AbstractDespite the fact that technology is progressing, people continue to consume deadly wild mushrooms due to their inability in classifying the different mushroom categories. Currently, no specific traits are defined to accurately predict the edibility of mushrooms based on their attributes. To overcome this challenge, Machine Learning [ML] can be used for identifying the poisonous mushroom based on the feature appearance. By considering the above, the Mushroom dataset extracted from UCI data warehouse are used for predicting the mushroom edibility level. The division of mushroom edible classes are achieved in four different ways. Firstly, the dataset is preprocessed with feature scaling and missing values. Secondly, raw data set is fitted to all the classifier with and without the presence of feature scaling. Thirdly, raw data is applied with the principal component analysis with 8, 10 and 12 components and PCA reduced dataset is fitted to all the classifier with and without the presence of feature scaling. Fourth, the raw data is applied with the Linear discriminant analysis and LDA reduced dataset is fitted to all the classifier with and without the presence of feature scaling. Fifth, raw data is applied with the Factor analysis with 8, 10 and 12 components and FA reduced dataset is fitted to all the classifier with and without the presence of feature scaling. Sixth, the performance of raw data set, PCA reduced data set, LDA reduced dataset and FA reduced dataset are compared by analyzing the performance metrics like Precision, Recall, Accuracy and F-score. The implementation is done by using python language under Spyder platform with Anaconda Navigator. Experimental results show that, the kernel SVM, KNN and Adaboost classifier for the FA reduced dataset tends to retain the accuracy with 100% before and after feature scaling. The KNN classifier with LDA reduced dataset tends to retain the 96% accuracy before and after feature scaling. Kernel SVM, KNN and Adaboost classifier for the FA reduced dataset tends to retain the accuracy with 99% before and after feature scaling. From the above analysis, KNN classifier ismore efficient based on its accuracy with all PCA, LDA and FA reduced dataset.KeywordsMachine learningClassificationPrecisionAccuracyPCALDA and factor analysis

Full Text