Multimodal dataset of real-time 2D and static 3D MRI of healthy French speakers

Karyna Isaieva,Ioannis K Douros,Pierre-André Vuissoz,Yves Laprie,Jacques Felblinger,Justine Leclère

doi:10.1038/s41597-021-01041-3

Abstract

The study of articulatory gestures has a wide spectrum of applications, notably in speech production and recognition. Sets of phonemes, as well as their articulation, are language-specific; however, existing MRI databases mostly include English speakers. In our present work, we introduce a dataset acquired with MRI from 10 healthy native French speakers. A corpus consisting of synthetic sentences was used to ensure a good coverage of the French phonetic context. A real-time MRI technology with temporal resolution of 20 ms was used to acquire vocal tract images of the participants speaking. The sound was recorded simultaneously with MRI, denoised and temporally aligned with the images. The speech was transcribed to obtain phoneme-wise segmentation of sound. We also acquired static 3D MR images for a wide list of French phonemes. In addition, we include annotations of spontaneous swallowing.

Highlights

Background & SummaryThe investigation of the movement of speech articulators has a number of applications including study of speech production[1], speech recognition[2], as well as some medical applications: diagnosis and rehabilitation of abnormal speech and swallowing, study of orto-facial structures implicated in sleep apnoea syndrome[3]
Magnetic resonance imaging (MRI) holds one of the leading positions as a data acquisition method in speech sciences[7–10] due to its non-invasiveness and absence of long-term health hazards. To other techniques such as ultrasound, which fails to visualise the articulators separated from the sensor by air, or electromagnetic articulography (EMA) which provides only the sensors’ trajectories glued on the upper vocal tract articulators, magnetic resonance imaging (MRI) succeeds to visualise the whole vocal tract
All the existing publicly available databases offering high spatio-temporal resolution dynamic MRI, exploit similar acquisition technologies due to the fact that they are acquired by the same research team

Summary

Background & Summary

The investigation of the movement of speech articulators has a number of applications including study of speech production[1], speech recognition[2], as well as some medical applications: diagnosis and rehabilitation of abnormal speech and swallowing, study of orto-facial structures implicated in sleep apnoea syndrome[3]. One of the techniques allowing a reasonable spatio-temporal resolution of recorded speech, is cine-MRI11,12 This method requires several identical repetitions of the same target utterance, which leads to artifacts in case of non-periodicity, and increases acquisition time. The technique[20] makes use of radial sampling and the regularized nonlinear inversion reconstruction Another approach, which does not necessarily assume a non-cartesian encoding, was used for dynamic 3D imaging of the vocal tract[14,21]. All the existing publicly available databases offering high spatio-temporal resolution dynamic MRI, exploit similar acquisition technologies due to the fact that they are acquired by the same research team. In this work we report on a multi-modal MRI database consisting of 2D real-time and 3D static MR images of the vocal tract of 10 French speakers. The dataset includes annotations of the speech and of spontaneous swallowing and will provide researchers with data having a good coverage of the French phonetics to further explore French speech production and physiological processes taking place in the vocal tract vicinity

Methods

Code availability