Abstract

This paper presents a qualitative analysis of spoken Turkish based on approximately 77 minutes of transcribed content from television news and debates with special focus on acoustic and morphological aspects relevant to respeaking, where an edited form of the original verbal content is dictated to a speaker-dependent automatic speech recognition (ASR) engine, whose output is further edited for broadcast-quality subtitles. The data suggest that respeaking can solve only some of the potential problems unscripted speech presents for ASR. On the acoustic level, a respeaker can overcome segmental and suprasegmental variation as well as degraded acoustic conditions, and can partially resolve overlapping speech. On the morphological level, disfluency and deviant morphology can be handled. When paraphrasing the text and dictating punctuation marks under time pressure, a respeaker can hardly control her own pronunciation; deletion, reduced morphemes, etc., might lead to misrecognition. As standard orthography, capitalization, and punctuation are required in subtitles, named entities, figures, morphophonemics and use of apostrophe will require satisfactory solutions during ASR and/or post-editing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call