Abstract

Flight delays may propagate through the entire aviation network and are becoming an important research topic. This paper proposes a novel hierarchical integrated machine learning model for predicting flight departure delays and duration in series rather than in parallel to avoid ambiguity in decision making. The paper analyses the proposed model using various machine learning algorithms in combination with different sampling techniques. The highly noisy, unbalanced, dispersed, and skewed historical high dimensional data provided by an international airline operating in Hong Kong was used to demonstrate the practical application of the model. The result shows that for a 4-h forecast horizon, a constructive neural network machine learning algorithm with the Synthetic Minority Over Sampling Technique-Tomek Links (SMOTETomek) sampling technique was able to achieve better average balanced recall accuracies of 65.5%, 61.5%, 59% for classifying delay status and predicting delay duration at thresholds of 60 min and 30 min, respectively. Similarly, for minority labels, the precision-recall and area under the curve showed that the proposed model achieved better results of 32.44% and 35.14% compared to the parallel model of 26.43% and 21.02% for thresholds of 60 min and 30 min, respectively. The effect of different sampling techniques, sampling approaches, and estimation mechanisms on prediction performance is also studied.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call