A deep neural network approach to investigate tone space in languages

Bing'Er Jiang,Tim O'Donnell,Meghan Clayards

doi:10.1121/1.5101949

Abstract

Phonological contrasts are usually signaled by multiple cues, and tonal languages typically involve multiple dimensions to distinguish between tones (e.g., duration, pitch contour, and voice quality, etc.). While the topic has been extensively studied, research has mostly used small datasets. This study employs a deep neural network (DNN) based speech recognizer trained on the AISHELL-1 (Bu et al., 2017) speech corpus (178 hours of read speech) to explore the tone space in Mandarin Chinese. A recent study shows that DNN models learn linguistically-interpretable information to distinguish between vowels (Weber et al., 2016). Specifically, from a low-dimensional Bottleneck layer, the model learns features comparable to F1 and F2. In the current study, we propose a more complicated Long Short-Term Memory (LSTM) model—with a Bottleneck layer implemented in the hidden layers—to account for variable duration, an important cue for tone discrimination. By interpreting the features learned in the Bottleneck layer, we explore what acoustic dimensions are involved in distinguishing tones. The large amount of data from the speech corpus also renders the results more convincing and provides additional insights not possible from studies with more limited data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A deep neural network approach to investigate tone space in languages

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Similar Papers

Evaluation of small-scale deep learning architectures in Thai speech recognition
Jirayu Kaewprateep ... Santitham Prom-On
-
Jirayu Kaewprateep, et. al.Jirayu Kaewprateep ... Santitham Prom-On
01 Feb 2018
01 Feb 2018

An LSTM based DNN Model for Neurological Disease Prediction Using Voice Characteristics
Anila M ... D Malathi Rani
EAI Endorsed Transactions on Pervasive Health and Technology | VOL. 10
Anila M, et. al.Anila M ... D Malathi Rani
14 Mar 2024
EAI Endorsed Transactions on Pervasive Health and Technology | VOL. 10

Deep Elman recurrent neural networks for statistical parametric speech synthesis
Sivanand Achanta ... Suryakanth V Gangashetty
Speech Communication | VOL. 93
Sivanand Achanta, et. al.Sivanand Achanta ... Suryakanth V Gangashetty
03 Aug 2017
Speech Communication | VOL. 93

A Deep Learning Inspired Belief Rule-Based Expert System
Raihan Ul Islam ... Mohammad Shahadat Hossain
IEEE Access | VOL. 8
Raihan Ul Islam, et. al.Raihan Ul Islam ... Mohammad Shahadat Hossain
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A deep neural network approach to investigate tone space in languages

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America