Phonetic Database Research Articles

Distinctive phonetic features have an important role in Arabic speech phoneme recognition. In a given language, distinctive phonetic features are extrapolated from acoustic features using different methods. However, exploiting lengthy acoustic features vector in the sake of phoneme recognition has a huge cost in terms of computational complexity, which in turn, affects real time applications. The aim of this work is to consider methods to reduce the size of features vector employed for distinctive phonetic feature and phoneme recognition. The objective is to select the relevant input features that contribute to the speech recognition process. This, in turn, will lead to a reduced computational complexity of recognition algorithm, and an improved recognition accuracy. In the proposed approach, genetic algorithm is used to perform optimal features selection. Therefore, a baseline model based on feedforward neural networks is first built. This model is used to benchmark the results of proposed features selection method with a method that employs all elements of a features vector. Experimental results, utilizing the King Abdulaziz City for Science and Technology Arabic Phonetic Database, show that the average genetic algorithm based phoneme overall recognition accuracy is maintained slightly higher than that of recognition method employing the full-fledge features vector. The genetic algorithm based distinctive phonetic features recognition method has achieved a 50% reduction in the dimension of the input vector while obtaining a recognition accuracy of 90%. Moreover, the results of the proposed method is validated using Wilcoxon signed rank test.

Read full abstract

The demand for telecommunications applications of automatic speech recognition has exploded in recent years. This area seems a natural candidate for speech recognition systems, since it embraces a tremendous variety of applications that rely entirely on audio signals and serial interfaces. However, the telecommunications environment strains the capabilities of current technology, given its broad range of uncontrollable variables, from speaker characteristics to telephone handsets and line quality. Current recognition systems have attained impressive performance levels on relatively controlled tasks, such as speaker-independent continuous digit recognition on laboratory databases comprising a few hundred speakers [l-3]. To comprehend the additional challenges of the telecommunications environment, we must study the effects on recognition of handset and channel characteristics, speaker accent, speaking style, and lexicon, as well as the interactions among these factors. No small amount of data will suffice to model these conditions. Simultaneous with the explosion of telecommunications applications has been the introduction of powerful statistical modeling techniques, known as hidden Markov models (HMMs), to speech recognition [4,5]. These computationally intensive algorithms introduce a large number of degrees of freedom into the speech recognition problem and hence exhibit slow convergence properties. As a consequence, they require orders of magnitude more training data than the previous generation of deterministic techniques. Many databases collected in the mid-l%Os, such as the DARPA Resource Management database [6] and the TIMIT Acoustic Phonetic database [7], while ambitious programs in their own right, have proven to consistently underrepresent important dimensions in HMM recognition systems due to their limited coverage. The Voice Across America (VAA) database being collected at Texas Instruments is designed to satisfy the data requirements of this next generation of speech recognition systems. Our goal is to collect data over standard long-distance telephone lines from 100,000 speakers representing a demographically and geographically balanced sample of the contiguous United States. This database will provide the foundation for a thorough investigation of factors affecting speaker-independent continuous speech recognition for American English. Similar projects are being planned for other countries, and will form the basis for research into recognition of Japanese, British English, and European languages. As of now, we have completed two phases of the VAAproject for a total of 50,000 utterances from nearly 3700 speakers. This paper describes the methods and motivation for VAA data collection and validation procedures, the current contents of the database, and the results of exploratory research on a 1088-speaker subset of the database. Our initial results underscore the need for an extensive database: even 1088 speakers-a large database by traditional standards-are insufficient to adequately represent the many dimensions of interest. One of our purposes here is to share the insights we have gained into telephone-based data collection, in the belief that the VAA model is likely to become the standard method of collecting data over the tele-

Read full abstract

Phonetic Database Research Articles

Related Topics

Articles published on Phonetic Database

Comparing pre-linguistic normalization models against US English listeners’ vowel perception

Analysis of the Artistic Elements of Broadcast Hosting Based on Media Speech Corpus

On the Monophthong Features of Cangzhou Dialect (Hebei Province) based on Acoustic Data Analysis in the Big Data Era

Compiling of Phonetic Database Structure

Optimizing Arabic Speech Distinctive Phonetic Features and Phoneme Recognition Using Genetic Algorithm

Construct a phonetic database and develop a phonetic transcription in Gujarati language

The Phonetic Alphabet of the Chechen Language as a Basis of a Speech-Synthesis System

Information-theoretic analysis of efficiency of the phonetic encoding–decoding method in automatic speech recognition

System Determining Pronunciation Correctness of Japanese Words

Phonetic encoding method in the isolated words recognition problem

Evidence for Direct Geographic Influences on Linguistic Sounds: The Case of Ejectives

Turing Test-Based Evaluation of an Experimental System for Generation of Casual English Sentences from Regular English Input

Hybrid models based on biological approaches for speech recognition

Voice and Aspiration of Stops in Turkish

Review of Pickering & Rosner (1993): Oxford Acoustic Phonetic Database on Compact Disc

Voice across America: Toward robust speaker-independent speech recognition for telecommunications applications

Spectral characteristics of English stops in prestressed position

Duration of English stops in prestressed position

Acoustic phonetic data base for the study of selected English consonants, consonant clusters, and vowels

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Phonetic Database Research Articles

Related Topics

Articles published on Phonetic Database

Comparing pre-linguistic normalization models against US English listeners’ vowel perception

Analysis of the Artistic Elements of Broadcast Hosting Based on Media Speech Corpus

On the Monophthong Features of Cangzhou Dialect (Hebei Province) based on Acoustic Data Analysis in the Big Data Era

Compiling of Phonetic Database Structure

Optimizing Arabic Speech Distinctive Phonetic Features and Phoneme Recognition Using Genetic Algorithm

Construct a phonetic database and develop a phonetic transcription in Gujarati language

The Phonetic Alphabet of the Chechen Language as a Basis of a Speech-Synthesis System

Information-theoretic analysis of efficiency of the phonetic encoding–decoding method in automatic speech recognition

System Determining Pronunciation Correctness of Japanese Words

Phonetic encoding method in the isolated words recognition problem

Evidence for Direct Geographic Influences on Linguistic Sounds: The Case of Ejectives

Turing Test-Based Evaluation of an Experimental System for Generation of Casual English Sentences from Regular English Input

Hybrid models based on biological approaches for speech recognition

Voice and Aspiration of Stops in Turkish

Review of Pickering &amp; Rosner (1993): Oxford Acoustic Phonetic Database on Compact Disc

Voice across America: Toward robust speaker-independent speech recognition for telecommunications applications

Spectral characteristics of English stops in prestressed position

Duration of English stops in prestressed position

Acoustic phonetic data base for the study of selected English consonants, consonant clusters, and vowels

Review of Pickering & Rosner (1993): Oxford Acoustic Phonetic Database on Compact Disc