Abstract

The conventional speech applications have been developed by looking at sound substances of speech and this strategy is derived from acoustic phonetics. However, the speech representation of this framework, e.g., spectrogram, is inevitably with nonlinguistic factors such as speakers and microphones. Then, all the developed systems come to select users, where the systems may work fine with a major part of the users but still work poorly with the others. The author believes that this situation should be avoided for educational applications because users don’t know whether they are outliers or not. Recently, the author proposed another framework of developing speech applications by looking at only sound contrasts and this strategy is derived from physically and mathematically interpreting structural phonology. On this new framework, nonlinguistic factors such as speakers and microphones are mathematically removed from speech as pitch information can be removed by smoothing the spectrogram. This talk shows what is possible by the new framework, which completely discards the absolute acoustic properties such as formants and spectral envelopes. In other words, in the new framework, an utterance is modeled as organized pattern of sounds although it has been modeled just as a linear string of sounds in the conventional framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call