In botany and agriculture, classifying leaves is a crucial process that yields vital information for studies on biodiversity, ecological studies, and the identification of plant species. The Cope Leaf Dataset offers a comprehensive collection of leaf images from various plant species, enabling the development and evaluation of advanced classification algorithms. This study presents a robust methodology for classifying leaf images within the Cope Leaf Dataset by enhancing the feature extraction and selection process. Cope Leaf Dataset has 99 classes and 64 features with 1584 records. Features are extracted based on the margin, texture, and shape of the leaves. It is challenging to classify a large number of labels because of class imbalance, feature complexity, overfitting, and label noise. Our approach combines advanced feature selection techniques with robust preprocessing methods, including normalization, imputation, and noise reduction. By systematically integrating these techniques, we aim to reduce dimensionality, eliminate irrelevant or redundant features, and improve data quality. Increasing accuracy in classification, especially when dealing with large datasets and many classes, involves a combination of data preprocessing, model selection, regularization techniques, and fine-tuning. The results indicate that the Multilayer Perception algorithm gives 89.48%, the Naïve Bayes Classifier gives 89.63%, Convolutional Neural Networks has 88.72%, and the Hoeffding Tree algorithm gives 89.92% accuracy for the classification of 99 label plant leaf classification problems.
Read full abstract