Early Callsign Highlighting using Automatic Speech Recognition to Reduce Air Traffic Controller Workload

Shruthi Shetty,Oliver Ohneiser,Matthias Kleinert,Hartmut Helmke

doi:10.54941/ahfe1002493

Abstract

The primary task of an air traffic controller (ATCo) is to issue instructions to pi-lots. However, the first verbal communication contact is often initiated by the pi-lot. Hence, the ATCo needs to search for the aircraft radar label that corresponds to the callsign uttered by the pilot. Therefore, it would be useful to have a control-ler assistance system, which recognizes and highlights the spoken callsign in the ATCo display as early as possible, directly from the speech data. Therefore, we propose to use an automatic speech recognition (ASR) system to first obtain the speech-to-text transcription, followed by extracting the spoken callsign from the transcription. As a high performance in callsign recognition is required, we use surveillance data, which significantly reduces callsign recognition error rates. When using ASR transcriptions for ATCo utterances of Isavia data (HAAWAII project ), we initially obtain a callsign recognition error rate of 6.2%, which im-proves to 2.8% when surveillance data information is used.For the ATC operational speech data obtained from NATS air navigation service provider for London approach area, currently we obtain a callsign recognition rate of 93.8% for both ATCo and pilot utterances on automatic transcriptions which are generated by an ASR system with a word error rate of 5.1%. However, when surveillance data is not used, the callsign recognition rate drops significantly to 82.7%, indicating the importance of using surveillance data while recognizing callsigns. Once the callsign is spoken, we are able to recognize it within a second, which would be of great value to ATCos especially in situations of high traffic constituting high workload.

Full Text