Abstract

Phonotactic modelling, typically in the form of a PPRLM system, forms a key component in state-of-the-art Language Identification (LID) systems. Given the objective of PPRLM systems is to capture as accurately as possible the phonotactics which characterise a language, it is assumed that the minimisation of Phone Error Rate (PER) is a precursor to achieving this effectively. In this paper we examine the relevance of PER as a metric for determining eventual LID performance. In order to conduct this investigation we make use of the CallHome corpus, based on the premise it provides a better representation for the style of discourse and channel conditions encountered in the Conversational Telephone Speech (CTS), which is now the focus of current NIST LID evaluations. Using CallHome instead of the OGI-MLTS corpus to train phone recognisers, we obtained significantly improved results, with an average improvement of approximately 6% absolute across the 30, 10 and 3 seconds tasks for the NIST 1996 and 2003 evaluations. We also examine the impact of tuning the individual front-end recognisers, on both the resultant PER of other languages and against the resultant LID performance. We find that PER has a number of limitations in indicating both the degree and direction of changes to LID performance. Accordingly, we propose a new metric which is better suited for forecasting the impact on LID performance when the phone recogniser front-end is modified.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call