Data pre-processing is an important strategy in chemometrics and related fields because in many cases the transformation of data has a great effect on the performance of the method (model). However, a careful examination of the literature clearly points out that only very few systematic studies are dedicated to the effect of the derivative spectra on the performance of the pattern recognition methods. This comprehensive study compares the impact of the order of derivative spectra and other data pre-processing procedures (normalization and standardization) on the performance of cluster analysis, principal component analysis and discriminant analysis applied for characterization and classification of medicinal plants according to their phylum using UV spectra. The efficiency of the pre-processing methods was estimated by comparing the accuracy of classification and prediction measured by internal cross-validation. Derivatization method (1st order) resulted in the best classification (100%) of medicinal plants according to their phylum (Pteridophyte, Magnoliophyte and Spermatophyte) as compared to other pre-processing methods (normalized spectra-71.4%, standardized spectra-76.2% and original spectra-78.6%).
Read full abstract