Air traffic control (ATC) hazard feature extraction is a key information retrieval task for air traffic hazard records. While text-based feature extraction ranks term importance based solely on statistical results, we aim to use external knowledge to extract features that meet the definition of hazards. This paper proposes a feature extraction method based on expert knowledge to define hazard features and construct a hazard analysis framework. We illustrate the model training process using communication navigation and surveillance (CNS) data, which includes candidate feature generation, feature vectorization, and cluster-based standardization. The correct structure of terms in hazard records, the vector distribution of candidate features, and the clustering effect of different methods are briefly explored. The algorithm refines and accumulates expert knowledge through iteration. The experiment results demonstrate that the dataset obtained after specific linguistic processing based on expert knowledge could extract more informative candidate features to construct analysis context by k-means. The proposed model outperformed four comparative algorithms in accuracy, reaching 82% and 86% in the air traffic control operation (ATCO) dataset and the CNS dataset, respectively. Additionally, the information-rich hazard features support safety management departments’ decision-making, reducing the cost of investigating hidden hazards.
Read full abstract