Abstract

Recent research in speaker identification technology suggests that it can operate in co-channel environments provided the system can have access to only the less-corrupted segments of speech. In order to identify the uncorrupted speech segments as accurately as possible, it is necessary to fully characterize the statistics of the random processes generating the uncorrupted segments. In a co-channel environment the uncorrupted speech segments are produced when one speaker’s voiced speech overlaps with the other speaker’s silence or unvoiced speech. Hence, if one has a statistical model of voiced, unvoiced, and silence segments, one can use this information to obtain a model of the uncorrupted speech segments. To accomplish this, statistical models that account for the observed voiced, unvoiced, and silence segment lengths are first developed. Markov models are used to account for dependencies between voiced, unvoiced, and silence segments. In addition, a model of the sampling distribution of the segmental target-to-interferer ratio (TIR) is developed and the short- and long-term correlation present in the segmental TIR signal is also explored.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call