Abstract

Most of the flight accident data have uneven distribution of categories. When the traditional classifier is applied to this data, it will pay less attention to the minority class data. Synthetic Minority Over-sampling Technique (SMOTE), and its improvements are well-known methods to address this imbalance problem at the data level. However, traditional algorithms still have the problems in blurring the boundary of positive and negative classes and changing the distribution of original data. In order to overcome these problems and accurately predict flight accidents, a new Clustered Biased Borderline SMOTE(CBB-SMOTE) is proposed for Quick Access Recorder (QAR) Go-Around data. It generates more obvious positive and negative class boundaries by using K-means for boundary minority class data and safety minority class data respectively, and maintains the original data distribution to the greatest extent through a biased oversampling method. Experiments were carried out on a group of QAR Go-Around data. The data set is balanced by CBB-SMOTE, SMOTE, Cluster-SMOTE algorithm respectively, and the random forest algorithm is used to predict the new data set. The experimental results show that CBB-SMOTE outperforms the SMOTE in terms of G-means value, Recall and AUC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call