Abstract
Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area of signal processing and pattern recognition. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition system is to improve recognition accuracy. In order to develop robust AVSR systems under Human Computer Interaction an appropriate simultaneously recorded speech and video data are needed. This paper describes a „vVISWa‟ (Visual Vocabulary of Independent Standard Words) database consists of audio visual data of 48 native speakers and 10 nonnative speakers. These speakers have contributed towards development of corpus in three profiles that is full frontal, 45 profile and side pose. This database was primarily designed to deal with Multi-pose Audio Visual Speech Recognition system for three languages that is, „Marathi‟ (The Native language of Maharashtra), „Hindi‟ (National Language of India) and „English‟ (Universal language). This database is multi-pose, multi-lingual database formed in Indian context. This database available by request from http://visbamu.in/viswaDataset.html.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.