Abstract
Performing multilingual phonetic analysis is challenging, not least due to the lack of performant tools for performing basic analyses on (non-English) speech data. Without the capacity to perform basic operations such as transcription or search through speech data for understudied languages, critical questions in cross-linguistic phonetic analysis remain elusive [Blasi etal. Trends in Cog. Sci. 26(12), 1153–1170 (2022)]. To this end, we will demonstrate our recent advances towards multilingual representation learning and automatic phonetic transcription, as part of our AnySpeech initiative. Specifically, we will demonstrate two tools. First, the CLAP-IPA model—a phoneme-to-speech model, capable of performing open-vocabulary keyword spotting in any language without any parameter updates, including languages that were not in the training dataset [Zhu et al., NAACL, 750-772 (2024)]. Second, the IPAPack transcription model, a lightweight transcription model, capable of annotating 473 different phones, including those that were lacking in prior models (e.g., click consonants in Bantu languages). We will demonstrate that our models can be easily downloaded and set up on consumer-grade laptops with little effort. We anticipate that our tools will enable researchers to analyze speech recording repositories at scale, unlocking answers to critical questions in cross-linguistic phonetic analysis.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have