Maritime transport serves as a critical component of global trade and logistics, enabling the movement of goods and resources across oceans and waterways. Especially in busy waterways and ports, effective and accurate communication is essential, as it ensures the seamless exchange of information and the coordinated execution of port activities. However, comprehensibility is often hindered by factors such as poor audio quality, background noise, and diverse languages and accents. Automatic Speech Recognition (ASR) systems can mitigate these issues by providing real-time transcription and enabling the implementation of automated, value-adding services to enhance situational awareness. While pre-trained ASR models excel on general speech, maritime ASR faces unique challenges due to a lack of annotated data, diverse accents, and specialized terminology.To this end, we focus on improving the transcription quality of pre-trained ASR models for maritime communication with a particular focus on accurately recognizing maritime-specific terminology such as vessel and location names. Due to the scarcity of transcribed maritime communication, we create a synthetic training dataset tailored to regional maritime terminology. The synthetic audio is augmented with general human speech and used to fine-tune an end-to-end ASR model under various settings. The evaluation of the models employs a proprietary dataset of regional maritime radio communication from the port of Hamburg.The experimental results demonstrate a notable enhancement in ASR performance. Specifically, our approach yields an absolute improvement over the pre-trained baseline of 13.46% Word-Error-Rate and an increase of 41.57% recall for vessel names and 38.65% recall for locations. Our findings underscore the efficacy of integrating synthetic training data to address the challenges encountered in maritime ASR, paving the way for more robust and accurate speech recognition systems tailored to maritime applications.
Read full abstract