Ammonia nitrogen (NH3-N) is a key water quality variable that is difficult to measure in the water treatment process. Data-driven soft computing is one of the effective approaches to address this issue. Since the detection cost of NH3-N is very expensive, a large number of NH3-N values are missing in the collected water quality dataset, that is, a large number of unlabeled data are obtained. To enhance the prediction accuracy of NH3-N, a semi-supervised soft computing method using a self-constructing fuzzy neural network with an active learning mechanism (SS-SCFNN-ALM) is proposed in this study. In the SS-SCFNN-ALM, firstly, to reduce the computational complexity of active learning, the kernel k-means clustering algorithm is utilized to cluster the labeled and unlabeled data, respectively. Then, the clusters with larger information values are selected from the unlabeled data using a distance metric criterion. Furthermore, to improve the quality of the selected samples, a Gaussian regression model is adopted to eliminate the redundant samples with large similarity from the selected clusters. Finally, the selected unlabeled samples are manually labeled, that is, the NH3-N values are added into the dataset. To realize the semi-supervised soft computing of the NH3-N concentration, the labeled dataset and the manually labeled samples are combined and sent to the developed SCFNN. The experimental results demonstrate that the test root mean square error (RMSE) and test accuracy of the proposed SS-SCFNN-ALM are 0.0638 and 86.31%, respectively, which are better than the SCFNN (without the active learning mechanism), MM, DFNN, SOFNN-HPS, and other comparison algorithms.
Read full abstract