This paper presents a practical approach to classifying aviation safety reports in an operational context. The goals of the research are as follows: (a) successfully demonstrate a replicable, practical methodology leveraging Natural Language Processing (NLP) to classify aviation safety report narratives; (b) determine the number of reports (per class) required to train the NLP model to achieve an F1 performance score greater than 0.90 consistently; and, (c) demonstrate the model could be implemented locally, within the confines of a typical corporate infrastructure (i.e., behind the firewall) to allay information security concerns. The authors purposefully sampled 425 safety reports from 2019 to 2021 from a university flight training program. The authors varied the number of reports used to train an NLP model to classify narrative safety reports into three separate event categories. The NLP model’s performance was evaluated both with and without distractor data, running 30 iterations at each training level. NLP model success was measured using a confusion matrix and calculating Macro Average F1-Scores. Parametric testing was conducted on macro average F1 score performance using an ANOVA and post hoc Levene statistic. We determined that 60 training samples were required to consistently achieve a macro average F1-Score above the established 0.90 performance threshold. In future studies, we intend to expand this line of research to include multi-tiered analysis to support classification within a safety taxonomy, enabling improved root cause analysis.
Read full abstract