Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review.

Wookey Lee,Azizbek Marakhimov,Jessica Jiwon Seong,Busra Ozlu,Suan Lee,Bong Sup Shim

doi:10.3390/s21041399

Wookey Lee, Azizbek Marakhimov + Show 4 more

Open Access

https://doi.org/10.3390/s21041399

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: Feb 17, 2021
Citations: 52	License type: CC BY 4.0

Affiliation: Inha University, Semyung University

Abstract

Voice is one of the essential mechanisms for communicating and expressing one’s intentions as a human being. There are several causes of voice inability, including disease, accident, vocal abuse, medical surgery, ageing, and environmental pollution, and the risk of voice loss continues to increase. Novel approaches should have been developed for speech recognition and production because that would seriously undermine the quality of life and sometimes leads to isolation from society. In this review, we survey mouth interface technologies which are mouth-mounted devices for speech recognition, production, and volitional control, and the corresponding research to develop artificial mouth technologies based on various sensors, including electromyography (EMG), electroencephalography (EEG), electropalatography (EPG), electromagnetic articulography (EMA), permanent magnet articulography (PMA), gyros, images and 3-axial magnetic sensors, especially with deep learning techniques. We especially research various deep learning technologies related to voice recognition, including visual speech recognition, silent speech interface, and analyze its flow, and systematize them into a taxonomy. Finally, we discuss methods to solve the communication problems of people with disabilities in speaking and future research with respect to deep learning components.

Highlights

Voice is a basic means of communication and social interaction through spoken language
Voice is produced by the vibration of the vocal cords as a result of the airflow supplied by the respiratory system, the normal voice production depends on the coordination among airflow, laryngeal muscle strength and the supraglottic resonator cavities such as pharyngeal, oral and nasal cavity [1]
Figure we report on the recent advancessilent in the field communication systems of biosignal-based voice recognition and production with deep learning-based technologies

Summary

Introduction

Voice is a basic means of communication and social interaction through spoken language. People with voice disorders face serious problems in their daily lives, which may lead to emotional instability and isolation from society. A voice disorder implies that the pitch, intensity, or fluidity of one’s voice does not conform to his or her gender, age, body composition, social environment, and geographic location. The term voice disorder refers to all abnormal conditions in which the expression of the voice is not in its normal range. Voice is produced by the vibration of the vocal cords as a result of the airflow supplied by the respiratory system, the normal voice production depends on the coordination among airflow, laryngeal muscle strength and the supraglottic resonator cavities such as pharyngeal, oral and nasal cavity [1]. The reasons for the voice disorder can be categorized as organic, functional, and/or psychogenic causes

Methods

Findings

Discussion

Conclusion