In the aviation sector, human factors are the primary cause of safety incidents. Intelligent prediction systems, which are capable of evaluating human state and managing risk, have been developed over the years to identify and prevent human factors. However, the lack of large useful labelled data has often been a drawback to the development of these systems. This study presents a methodology to identify and classify human factor categories from aviation incident reports. For feature extraction, a text pre-processing and Natural Language Processing (NLP) pipeline is developed. For data modelling, semi-supervised Label Spreading (LS) and supervised Support Vector Machine (SVM) techniques are considered. Random search and Bayesian optimization methods are applied for hyper-parameter analysis and the improvement of model performance, as measured by the Micro F1 score. The best predictive models achieved a Micro F1 score of 0.900, 0.779, and 0.875, for each level of the taxonomic framework, respectively. The results of the proposed method indicate that favourable predicting performances can be achieved for the classification of human factors based on text data. Notwithstanding, a larger data set would be recommended in future research.
Read full abstract