An improved tri-training semi-supervised classification method was proposed as a solution to the restricted data conditions of supervised classification models relying on labeled data, semi-supervised models not considering imbalanced proportion of faults data and the presence of faults data with similar feature or varying severity in existing data-driven nuclear power system fault diagnosis studies. Based on the structure of three identical sub-classifiers in the original tri-training algorithm, data update paths have been added, the updated data selection has been strengthened, and the initial information and permissions of sub-models have been differentiated. The novel method only requires the dataset to meet the Smoothness Assumption, and through multiple strict data update paths, it improves the utilization of unlabeled training data while reducing Pseudo-labeling process errors. The difference in permissions and initial information between sub-classifiers is utilized to improve the training rigor for fault types with smaller sample sizes, in order to alleviate the problem of imbalanced data. Based on a marine nuclear power system secondary circuit model and a widely recognized nuclear power plant dataset, multiple nuclear power system common restricted fault diagnosis datasets faced in ships and power plants were obtained to verify the novel method's advantages and generalization. The conclusion is that under the same conditions, compared with the original tri-training method and other semi-supervised learning methods, the Accuracy, AUC, Precision Rate and Recall Rate of the novel method have improved by about 21 %, 10 %, 20 %, and 21 %, respectively, and can reduce about 25 % of misjudgments between similar feature faults.
Read full abstract