Abstract

Data-driven frameworks for analyzing aviation safety data have recently gained traction. Text-based machine learning techniques often rely purely on word frequency analysis to eliminate the innate subjectivity of human language, but more refined techniques like structural topic modeling (STM) attempt to simulate text generation to identify the thematic undertones of text corpora. This paper presents an application of STM to two text-based sets of aviation safety data, the Aviation Safety Reporting System (ASRS) and accident and incident reports published by the National Transportation Safety Board (NTSB). A framework for cleaning and pre-processing the datasets is discussed, including a brief discussion of bag-of-words and TF–IDF representations of narratives. The methodology behind STM is described, including techniques for selecting the optimal number of topics. The results of the STM analysis on the ASRS and NTSB datasets are presented, with a focus on the clarity and specificity based on most common words associated with topics. A brief exploration of the correlation between pairs of topic labels is also undertaken, including a visualization of narratives in 2-dimensional space. STM is found to show promise in identifying themes within technical datasets, with model performance increasing for more specific corpora that use precise and unique language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call