Uncertainty-based active learning strategies have demonstrated significant superiority in small data research of materials domain. This study explores the effects of model uncertainty and data uncertainty separately on the performance of active learning strategies, specifically focusing on the number of iterations required to identify the optimal samples. For model uncertainty, three kinds of acquisition functions are compared, including predicted value strategy (PV), ranking of predicted value strategy (PR) and expected improvement strategy (EI). Among these, the active learning model utilizing PR requires the fewest average iterations (1.75). For data uncertainty, we evaluate the iterations of active learning by Gaussian process models that incorporate the uncertainty of the observations and noise samples that takes account into the uncertainty of the input features respectively. The results indicate that the active learning iterations of the three strategies converge to similar at the optimal weighting when the uncertainty of the observations is considered in the model (EI for 1.75, PV for 1.21 and PR for 1.18). In contrast, incorporating noise samples into the augmented dataset after the original samples would severely deteriorate the efficiency of active learning recommendations. Our findings aim to offer guidance for exploring more favorable acquisition functions and methods for active learning strategies.
Read full abstract