Phoneme Boundary Research Articles

A robust and speaker‐independent algorithm for the automatic segmentation of speech has been designed. It aligns a phonetic transcription with a phoneme nucleus detector, which is based on the temporal decomposition paradigm (TD) [B. Atal, IEEE Trans. Acoust. Speech Signal Process. ASSP‐26, 81–84 (1983); Bailly, Marteau, and Abry, Proc. Int. Conf. ASSP, Glasgow, Scotland, 508–511 (1989)]: The phonetic string is seen as overlapping emergence functions (EFs) whose maxima arise for the phoneme nucleus. The segmenter minimizes the reconstruction error (least‐squares error) between the time‐frequency representation of the speech signal and the above model. The automatic segmentation and alignment algorithm performs the task in three steps: (a) predetection of phonemes nuclei centers, (b) time alignment of the corresponding phonetic transcription, and (c) adjustment of these output nuclei centers and phoneme boundaries detection. The first step, which is inspired by Van Hemert's work [Van Hemert, Philips Tech. Rev. 43(9), 233–242 (1987)], uses an adaptive detection window to produce phonemes nuclei centers candidates. The second step uses the dynamic time warping (DTW) procedure to align these candidates with the known corresponding phonetic transcription. This DTW is guided by anchor points: a crude local probability function takes account of energy and zero‐crossings distributions for each phoneme. A new temporal decomposition technique gives an analytical solution with fixed number of targets and no compacity constraints. The TD errors between three consecutive candidates are used to calculate transient costs, thus enabling insertion and omission of nuclei centers. The third step adjusts the nuclei centers on the center of gravity of each of the corresponding EFs. It also produces a phoneme boundary segmentation considered by the time of equal adjacent EFs. This algorithm has been trained using 200 sentences pronounced by one speaker and tested using 50 sentences pronounced by seven speakers. On the test corpus, 86% of the phonemes nuclei centers candidates fall alone into one manual segment. In addition, 94% of the final nuclei centers match the manual segmentation.

The work reported in this paper is an attempt to understand vowel normalization better by investigating the relationship between vowel normalization and vowel contrast. In the first experiment, vowels from a “hood”–“hud” continuum were presented at two levels of fundamental frequency (F0) using two types of presentation. In one condition, tokens were blocked by F0. In the other, tokens with different F0 levels were randomly intermixed with each other (as in the typical F0 normalization experiment). In the mixed presentation, subjects identified the high F0 items most often as “hood” and the low F0 items most often as “hud”. In the blocked condition, there was no reliable difference between the high and low F0 continua. This pattern of results suggests that a contrast effect is at work. Therefore, four models of perceptual contrast were tested in simulations using auditorily-based spectra produced by a model which incorporates two levels of processing, (1) narrow-band auditory filtering (R. D. Patterson, J. Acoust. Soc. Am. 1976, 59, 640) and (2) wide-band integration (L. A. Chistovich, J. Acoust. Soc. Am. 1985, 77, 789). The experiment’s results could be approximated by either of two models: an auditory figure/ground model, and a talker contrast model. A second experiment distinguished between these two models. The auditory figure/ground model predicts that in a cross-series anchoring experiment (in which tokens with high F0 are used to anchor the low F0 continuum and tokens with low F0 are used to anchor the high F0 continuum) the boundary of the vowel identification function will be shifted toward the vowel quality of the anchoring stimulus. The talker contrast model predicts that the vowel quality of the anchoring stimulus is less important than its F0 and that the phoneme boundary will be shifted in the same direction regardless of the vowel quality of the anchoring stimulus. The results of the experiment quite unambiguously supported the predictions of the talker contrast model.

Phoneme Boundary Research Articles

Related Topics

Articles published on Phoneme Boundary

Automatic segmentation and alignment of continuous speech based on the temporal decomposition model

Spectral envelope distortion and vowel perception: Evidence for a central, auditory form of perceptual compensation

Psychoacoustic evidence for a contextual effect model

Contrast and normalization in vowel perception

Phoneme segmentation expert system using spectrogram reading knowledge

Aging and the Influence of Contextual Contrast on Vowel Identification

Aging and the influence of contextual contrast on vowel identification.

Perceptual categorization of synthesized /R‐W/ continua in normal preschool children

Perceptual compensation for transmission channel and speaker effects on vowel quality

Vowel quality changes produced by surrounding tone sequences.

The Influence of Pre- and Postplosive Fundamental Frequency on /t/–/d/ Perception in German

Effects of phase changes in low‐numbered harmonics on formant frequency matches

Speech perception in children with histories of recurrent otitis media.

Auditory compensation effects in symmetrical three‐vowel sequences and a model of perceptual phoneme boundary

Voco-auditory functions of the chimpanzee

Speech perception and frequency discrimination in good and poor readers

Labial Articulation Patterns Associated with Segmental Features and Syllable Structure in English

Imitation of a VOT continuum by native speakers of English and Spanish: evidence for phonetic category formation.

Cue salience in the perception of a stop voicing contrast by hearing and hearing‐impaired children

Individual differences in the perception of cues for initial stop place and voicing contrasts

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Phoneme Boundary Research Articles

Related Topics

Articles published on Phoneme Boundary

Automatic segmentation and alignment of continuous speech based on the temporal decomposition model

Spectral envelope distortion and vowel perception: Evidence for a central, auditory form of perceptual compensation

Psychoacoustic evidence for a contextual effect model

Contrast and normalization in vowel perception

Phoneme segmentation expert system using spectrogram reading knowledge

Aging and the Influence of Contextual Contrast on Vowel Identification

Aging and the influence of contextual contrast on vowel identification.

Perceptual categorization of synthesized /R‐W/ continua in normal preschool children

Perceptual compensation for transmission channel and speaker effects on vowel quality

Vowel quality changes produced by surrounding tone sequences.

The Influence of Pre- and Postplosive Fundamental Frequency on /t/–/d/ Perception in German

Effects of phase changes in low‐numbered harmonics on formant frequency matches

Speech perception in children with histories of recurrent otitis media.

Auditory compensation effects in symmetrical three‐vowel sequences and a model of perceptual phoneme boundary

Voco-auditory functions of the chimpanzee

Speech perception and frequency discrimination in good and poor readers

Labial Articulation Patterns Associated with Segmental Features and Syllable Structure in English

Imitation of a VOT continuum by native speakers of English and Spanish: evidence for phonetic category formation.

Cue salience in the perception of a stop voicing contrast by hearing and hearing‐impaired children

Individual differences in the perception of cues for initial stop place and voicing contrasts