Abstract

Although prosodic information has long been thought important for speech recognition, few demonstrations exist of its effective use in recognition systems. Lexical stress information has been shown to improve recognition performance by allowing the differentiation of confusable words (e.g., Rudnicky and Li, DARPA Workshop on Speech Recogn., June 1988). In this study, lexical stress modeling for a spreadsheet system with significant number of confusable words (e.g., EIGHTY and EIGHTEEN) is examined. The models used here have been evaluated on both read and spontaneous speech. A database of over 400 spreadsheet and numeric utterances was available for training a (HMM-based) speaker-independent continuous-speech system with a 273-word vocabulary and language perplexity of about 51. Testing data used in this study were based on read utterances and data generated in a separate study examining the use of a spoken-language spreadsheet. This latter set includes: (a) a “spontaneous” set, composed of parsable utterances from a spreadsheet task; (b) a “read” set, consisting of the spontaneous sentences read by their original speakers. The use of lexical stress models was found to reduce the error rate for read speech by approximately 10%. A comparison with the spontaneous data will provide an insight into the nature of the improvement.

Highlights

  • In 11067 two-syllablewords and 9640 three-syllablewords of the MRC PsycholinguistiDcatabase,5% of theformerclassand 15% of the latterhavethe samevowelqualityin two consecutivseyllablesO. f the20 allowedvocalicnuclei,95% occurhomovocalicallyin two-syllablewords

  • Thepaperdescribetshemethodsandresultsofa studyofthefeasibility of automaticallygradingtheperformancoef Japanessetudentswhen readingEnglishaloud.SRI recorded31adultJapanessepeakers2:2 men and9 women.EachJapanessepeakereadsixsentenceasloud.All 186 recordedutterancews erepresenteidn a randomorderforratingbythree expertlistenerws horatedtheutteranceosntwooccasionsS.peech-grading softwarewas developedfrom an adaptivehidden-Markov-model (HMM) speech-recognitisoynstemT. hegradingprocedureisa two-step processF:irst,thespeechtobegradedisalignedt,henthesegmentosfthe speechsignalthat arelocatedare comparedwith modelsof thosesegmentsthat havebeendevelopedfrom a databaseof speechfrom native speakerosf English.Importantpointsin theresultsare: (1) ratingsof speechqualitybyexpertlistenerasreextremelyreliablea, nd(2) automatic gradesfromthesystemcorrelatewell ( > 0.8) with thoseratings

  • G. Wilpona, ndRobertoPieraccian)i(SpeecRhesearcDhepartment, AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974)

Read more

Summary

Introduction

Rally occurringvariations (i.e., normal, slow, fast, soft, loud, angry). Analysesshowthat the acousticcharacteristicsof individualwordsvary considerablyacrosstalkers,andacrossstyleswithin talkers.Performance of humanlistenersand the two machine-basedrecognitionsystemswas testedin a single-talkerm, ultistylecondition,andin a multitalker,multistylecondition.All testswereconductedundertwo listeningconditions: normal,andin thepresenceofmaskingnoise.Thedatato bepresentedare theerrorpatternsof humanlistenersv, ersusthemachine-recognitiosnystems,exhibitedacrosstalkers,acrossspeakingstyles,andacrosstraining conditions(multitalker, multistyletraining versussingletalker, single styletraining). [Work supportedbyBoeingAerospaceandElectronics]. View Table of Contents: https://asa.scitation.org/toc/jas/86/S1 Published by the Acoustical Society of America noisea, ndLPCcodingN. Modeling lexical stressin read and spontaneouspeech.Joseph H. Exicalstressinformationhasbeenshownto improverecognitionperformanceby allowingthe differentiationof confusablewords (e.g., Rudnickyand Li, DARPA Workshopon SpeechRecogn.,June 1988).In thisstudy,lexicalstressmodelingfora spreadsheestystemwith significant number of confusable words (e.g., EIGHTY and EIGHTEEN) is examined.The modelsusedherehavebeenevaluatedon bothreadandspontaneousspeechA.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call