Fault diagnosis is an essential means to ensure the regular operation of mechanical systems. The existing data-driven algorithms are developed based on the assumption that the given label is entirely correct. However, mislabeling is common, which often occurs in industrial applications. These methods will overfit these mislabeled samples, resulting in inferior generalization. To this end, this article proposes a novel multistage true label distribution learning algorithm. Specifically, based on the training characteristics of data-driven algorithms on noisy datasets, a novel multistage adversarial loss function (MSA-Loss) is proposed. MSA-Loss can make the model construct the true label distribution from noisy datasets, prevent the model from overfitting the noisy samples, and finally keep the model with good generalization. The proposed method can be easily applied to any existing data-driven algorithm to improve its performance on noisy datasets. Our method is verified on high-speed aeronautical bearing and motor datasets, which prove that MSA-Loss has an excellent performance in noisy label scenarios. It can significantly improve the potential of existing diagnostic models in practical industrial applications.