Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework

JOSEF ROBERT NOVAK,NOBUAKI MINEMATSU,KEIKICHI HIROSE

doi:10.1017/s1351324915000315

Abstract

AbstractThis paper provides an analysis of several practical issues related to the theory and implementation of Grapheme-to-Phoneme (G2P) conversion systems utilizing the Weighted Finite-State Transducer paradigm. The paper addresses issues related to system accuracy, training time and practical implementation. The focus is on joint n-gram models which have proven to provide an excellent trade-off between system accuracy and training complexity. The paper argues in favor of simple, productive approaches to G2P, which favor a balance between training time, accuracy and model complexity. The paper also introduces the first instance of using joint sequence RnnLMs directly for G2P conversion, and achieves new state-of-the-art performance via ensemble methods combining RnnLMs and n-gram based models. In addition to detailed descriptions of the approach, minor yet novel implementation solutions, and experimental results, the paper introducesPhonetisaurus, a fully-functional, flexible, open-source, BSD-licensed G2P conversion toolkit, which leverages the OpenFst library. The work is intended to be accessible to a broad range of readers.

Full Text