Abstract This work reports a critical evaluation of performance of various data analysis techniques applied to the classification of biosamples using voltammetric electronic tongue. Data obtained by sensor array of carbon-paste electrodes were processed by 3 pattern recognition methods: Partial Least Squares Discriminant Analysis (PLS-DA) as the “golden standard” in electronic tongue data analysis, linear Soft Independent Modelling of Class Analogy (SIMCA) and non-linear Support Vector Machine Discriminant Analysis (SVM-DA), the two latter being less commonly used. Due to high dimensionality of the data, various preprocessing methods (autoscaling, Standard Normal Variable – SNV, Discrete Wavelet Transform – DWT) were tested for each kind of these techniques to find most suitable workflow allowing to obtain satisfactory performance. Totally 1026 models were developed and tested. In order to compare their ability to estimate class affinity of the studied biosamples, 4 performance parameters: accuracy, sensitivity, precision, and specificity, were calculated for every model for both train and test sets to obtain reliable and repeatable results. It must be underlined, that some general remarks and findings that had been concluded during classification of oligopeptides, were later additionally validated using different dataset (comprising oligopeptides and amino acids). The treatment used confirmed that the presented recommendations can be generalized and adapted to other classification tasks by means of voltammetric electronic tongues.
Read full abstract