Bacterial presence in water is an important indicator of water quality and, when found in high concentrations, may risk human health. The detection of total coliforms, thermotolerant coliforms, and Escherichia coli (E. coli) in water through standard methods involves time-consuming and expensive laboratory tests, which may not always provide timely and accurate results. An alternative approach is excitation-emission matrix fluorescence spectroscopy (EEMFS), which offers fast detection of bacteria in water by analyzing fluorescent compounds. Chemometrics methods can be used to process EEMF spectrum, extract the relevant information, and differentiate water samples based on the presence of bacteria using classification models. In this study, various classification algorithms were applied to EEMFS datasets, including k-nearest neighbors (k-NN), partial least squares discriminant analysis (PLS-DA), multiway-PLS (NPLS-DA), principal component analysis with discriminant analysis (PCA-DA), support vector machines (SVM), and random forest (RF). Models were developed after the unfold multiway and parallel factor analysis (PARAFAC) to classify groundwater, freshwater, saltwater, and treated water samples according to the presence of E. coli, thermotolerant coliforms, and total coliforms. Among these models, PLS-DA, SVM, and RF demonstrated superior performance in discriminating the samples in most cases. In the test sets, the accuracy of the best models for total coliforms varied from 85.2% to 100% for groundwater, 71.4% to 98.2% for freshwater, 64.6% to 81.3% for treated water, and 65.8% to 71.1% for saltwater. Accuracy for E. coli and thermotolerant coliforms ranged from 89.3% to 100% in groundwater and from 64.7% to 87.5% for treated water. Results for thermotolerant coliforms in freshwater were 64.8% to 95.2%, and 63.3% to 76.7% for saltwater. For E. coli, accuracy ranged from 73.8% to 78.6% for freshwater and 65.0% to 80.0% for saltwater.
Read full abstract