Abstract
Data-driven frameworks for analyzing aviation safety data have recently gained traction. Text-based machine learning techniques often rely purely on word frequency analysis to eliminate the innate subjectivity of human language, but more refined techniques like structural topic modeling (STM) attempt to simulate text generation to identify the thematic undertones of text corpora. This paper presents an application of STM to two text-based sets of aviation safety data, the Aviation Safety Reporting System (ASRS) and accident and incident reports published by the National Transportation Safety Board (NTSB). A framework for cleaning and pre-processing the datasets is discussed, including a brief discussion of bag-of-words and TF–IDF representations of narratives. The methodology behind STM is described, including techniques for selecting the optimal number of topics. The results of the STM analysis on the ASRS and NTSB datasets are presented, with a focus on the clarity and specificity based on most common words associated with topics. A brief exploration of the correlation between pairs of topic labels is also undertaken, including a visualization of narratives in 2-dimensional space. STM is found to show promise in identifying themes within technical datasets, with model performance increasing for more specific corpora that use precise and unique language.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.