Flight delays can be costly and inconvenient for both airlines and passengers. Accurately predicting flight delays can help airlines and passengers to plan accordingly and reduce the negative impact of delays. In this project, we propose to use a random forest algorithm to predict flight delays. We pre-process the data by encoding categorical variables and handling missing values. The data includes various flight-related features such as departure time, airline carrier, flight distance, and weather conditions at the origin and destination airports. To forecast flight delays, we next divide the data into different sets and train the model. We evaluate the performance of our model on the testing set using various metrics such as accuracy, precision, recall and F1-score. Our results show that, the random forest algorithm can effectively predict flight delays with an accuracy of over 80%.The most important features for predicting flight delays are found to be departure time, flight distance, and weather conditions. Our model demonstrates the potential of using the random forest algorithm for flight delay prediction, which can help airlines and passengers to plan accordingly and reduce the negative impact of delays.
Read full abstract