This work introduces Aviation-BERT-NER, a Named Entity Recognition (NER) system tailored for aviation safety reports, building on the Aviation-BERT base model developed at the Georgia Institute of Technology’s Aerospace Systems Design Laboratory. This system integrates aviation domain-specific data, including aircraft types, manufacturers, quantities, and aviation terminology, to identify named entities critical for aviation safety analysis. A key innovation of Aviation-BERT-NER is its template-based approach to fine-tuning, which utilizes structured datasets to generate synthetic training data that mirror the complexity of real-world aviation safety reports. This method significantly improves the model’s generalizability and adaptability, enabling rapid updates and customization to meet evolving domain-specific requirements. The development process involved careful data preparation, including the synthesis of entity types and the generation of labeled datasets through template filling. Testing on real-world narratives from the National Transportation Safety Board (NTSB) database highlighted Aviation-BERT-NER’s robustness, with a precision of 95.34%, recall of 94.62%, and F1 score of 94.78% when evaluated over 50 manually annotated (BIO tagged) paragraphs. This work addresses a critical gap in English language NER models for aviation safety, promising substantial improvements in the analysis and understanding of aviation safety reports.
Read full abstract