Grapheme-to-phoneme conversion (G2P) is the task of converting letters (grapheme sequences) into their pronunciations (phoneme sequences). It plays a crucial role in natural language processing, text-to-speech synthesis, and automatic speech recognition systems. This paper provides a systematical overview of the G2P conversion from different perspectives. The conversion methods are first presented in the paper; detailed discussions are conducted on methods based on deep learning technology. For each method, the key ideas, advantages, disadvantages, and representative models are summarized. This paper then mentioned the learning strategies and multilingual G2P conversions. Finally, this paper summarized the commonly used monolingual and multilingual datasets, including Mandarin, Japanese, Arabic, etc. Two tables illustrated the performance of various methods with relative datasets. After making a general overall of G2P conversion, this paper concluded with the current issues and the future directions of deep learning-based G2P conversion.
Read full abstract