Abstract

Fundamental frequency (F0) and unvoiced/voiced segment (U/V) estimation of infant utterances are important for investigating humans perception of prosodic information in an early stage of speech communication. However, this estimation process is difficult as infant utterances have several features that differ to those of adults: (1) F0 has a wide range in value (200 to 2000 Hz); (2) F0 is unstable, for example, it discontinuously changes to its double or half value; and (3) voiced segments may have high energy in the higher frequency regions degrading U/V decisions of existing methods. Additionally, infant utterance data is often collected in daily child care settings which lowers the signal-to-noise ratio (SNR). To cope with these problems, a robust F0 estimation method based on instantaneous frequency [Nakatani and Irino, ICSLP2002] is introduced, and a new U/V detection method is proposed. The former has a mechanism to extract accurate F0 avoiding double and half pitch errors in low SNR environments. Once accurate F0 is obtained, the latter method can reliably detect U/V just by examining the harmonic structure corresponding to the F0. The effectiveness of this method is examined using a database devised from infant utterances in daycare settings [Amano, Kato, and Kondo, ICSLP2002].

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call