Abstract

Oral text is certainly discrete. It is built of “small bricks”, units of not only lexical but also the higher syntactical level. Common syntagmatic pauses, hesitative pauses such as physical (unfilled ones including breaks of clauses), sound pauses (e-e, m-m), and verbal (vot, kak eto, nu, znachit etc.) are markers of this discreetness. However, that reveals neither syntagma nor sentence as a unit to describe a syntactic structure of an oral text. Any type of pauses may occur in any place of an audio sequence. Thus, the search of sentences in spontaneous speech is quite complicated. In order to obtain such units a methodic of coercive punctuation that was used for marking the spontaneous monologues from the collection of oral texts named «Balanced Annotated Textotec» could be offered. The testee (philology experts) were asked to mark ends of the sentences by putting a period in the transcripts where neither pauses nor punctuation had been marked. The testee could only rely on the syntactic structure of the text and the connection between words and predicate centers. Involving more than twenty experts in an experiment provides more statistically accurate results. In this work we describe the results of our experiment and discuss further perspectives how those results can be used for automatic search of sentence boundaries in spontaneous speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.