Abstract

The aim of the research presented here is to report on a corpus-based method for discourse analysis that is based on the notion of segmentation, or the division of texts into cohesive portions. For the purposes of this investigation, a segment is defined as a contiguous portion of written text consisting of at least two sentences. The segmentation procedure developed for the study is called LSM (link set median), which is based on the identification of lexical repetition in text. The data analysed in this investigation were three corpora of 100 texts each. Each corpus was composed of texts of one particular genre: research articles, annual business reports, and encyclopaedia entries. The total number of words in the three corpora was 1,262,710 words. The segments inserted in the texts by the LSM procedure were compared to the internal section divisions in the texts. Afterwards, the results obtained through the LSM procedure were then compared to segmentation carried out at random. The results indicated that the LSM procedure worked better than random, suggesting that lexical repetition accounts in part for the way texts are segmented into sections.

Highlights

  • The aim of the research presented here is to report on a corpus-based method for the analysis of discourse organization that is based on the notion of segmentation, or the division of texts into cohesive portions

  • The results indicate the LSM segmentation procedure works better than random, and this suggests that lexical repetition accounts in part for how certain texts are segmented into sections

  • The study reported here suggested that this kind of discourse analysis can be carried out across a large number of texts

Read more

Summary

Introduction

The aim of the research presented here is to report on a corpus-based method for the analysis of discourse organization that is based on the notion of segmentation, or the division of texts into cohesive portions. For the purposes of this investigation, a segment is defined as a contiguous portion of written text consisting of at least two sentences (a space of text between two full stops), held together by lexical cohesive links. This follows Kukharenko (1979) and Scinto (1986), who point out that texts are constituted by sentence clusters, or ‘semantic topical and lexico-grammatical unities of two or more sentences’ (Kukharenko, 1979, p.235). The study presented here is an attempt to bridge this gap

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call