The theoretical developments of data-driven fault diagnosis methods have yielded fruitful achievements and significantly benefited industry practices. However, most methods are developed based on the assumption of data balance, which is incompatible with engineering scenarios. First, the normal state accounts for the majority of the equipment’s lifespan; second, the probability of various faults varies, both of which result in an imbalance in the data. The consequence of data imbalance in intelligent fault diagnosis methods has attracted extensive attention from the research community, and a significant number of papers have been published. Nevertheless, a comprehensive review of achievements in this field is still missing, and the research perspectives have not been thoroughly investigated. To end this, we review and discuss all the research achievements in fault diagnosis under data imbalance in this paper, based on our best knowledge. First, the existing imbalanced learning methods are classified into three categories: data processing methods, model construction methods, and training optimization methods. Then, the three methodologies are introduced and discussed in detail: the data processing method is to optimize the inputs of the intelligent fault diagnosis model so that the imbalance rate of the sample set involved in training is reduced; the model construction method is to design the structure and the features of the intelligent fault diagnosis model so that the model itself is resistant to the effects of imbalance; the training optimization method is an optimisation of the training process for intelligent fault diagnosis models, raising the importance of the minority class in the training. Finally, this paper summarizes the prospects of the imbalanced learning problem in intelligent fault diagnosis, discusses the possible solutions, and provides some recommendations.
Read full abstract