Abstract

The speech processing field has evolved dramatically in the last decade, especially for non-specialists using speech technology tools. Machine learning based on various neural networks (DNNs, GANs, CNNs, LSTMs, etc.), have emerged as a way of leveraging available training data to formulate new algorithms which have revolutionized performance for many challenging tasks including speech recognition, speaker identification, speech enhancement, etc. However, with such improved modeling capabilities, greater care is needed to ensure resulting solutions take into account fundamental properties of speech production, language, and perception. This talk provides a brief overview of current speech processing advancements using model based/supervised solutions. In particular, after briefly highlighting several approaches to speech processing, three examples are considered that highlight challenges/flaws/issues in applying supervised model-based concepts and machine learning without factoring in underlying foundations present in the original data context. These areas include: (i) word-count estimation for language assessment of child-adult speech—comparing with LENA language system, (ii) dialect identification using available corpora (e.g., “is the secret in the silence?”), and (iii) speech features for forensic audio analysis with noise and nonlinear distortion mismatch conditions. Suggestions for best practices will be highlighted in speech, speaker, and language based processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call