Abstract

Linguistic theory views a phoneme as a shorthand notation for a bundle of binary features related to the operation of the speaker's articulators. A representation of the speech waveform in terms of these underlying distinctive features is described here. The estimation of the probability of each of 14 linguistic features being encoded locally in the waveform is performed on a frame-by-frame basis. In going from the abstract to the physical level, it is recognized that the features are encoded in the waveform hierarchically and that time-varying manifestations of a feature within a phonemic segment are possible. These issues are addressed simultaneously through a two-stage procedure. In the first pass, the time portion and broad class of sound being represented by each frame are estimated. On the second pass, for each distinctive linguistic feature, models built explicitly for the estimated broad class portion are evaluated to arrive at the probability that each frame is part of a realization of a phoneme in which the feature is present. The distinctive feature representation is applied to the tasks of phoneme recognition and secondary classification in keyword spotting. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call