Abstract

Corpus-based researchers and traditional qualitative researchers, such as those interested in critical discourse analysis, are often required to select prototypical texts for close reading that include the language features of interest that are present in a much larger corpus. Traditional approaches to this selection procedure have been largely ad hoc. In this paper, we offer a more principled way of selecting texts for close reading based on a ranking of texts in terms of the number of keywords they contain. To facilitate this analysis, we have developed a multiplatform, freeware software tool called ProtAnt that analyses the texts, generates a ranked list of keywords based on statistical significance and effect size, and then orders the texts by the number of keywords in them. We describe various experiments that demonstrate the ProtAnt analysis is effective not only at identifying prototypical texts, but also identifying outlier texts that may need to be removed from a target corpus.

Highlights

  • Corpus-based researchers and traditional qualitative researchers are often required to select texts for close reading that include the language features of interest present in a much larger corpus

  • We propose a more principled way of selecting texts for close reading based on a ranking of texts in terms of the number of keywords they contain

  • Our experiments with ProtAnt confirm the utility of identifying prototypical texts using the concept of keywords, with results largely matching predicted outcomes

Read more

Summary

Introduction

Corpus-based researchers and traditional qualitative researchers are often required to select texts for close reading that include the language features of interest present in a much larger corpus. We propose a more principled way of selecting texts for close reading based on a ranking of texts in terms of the number of keywords (unusually frequent words in the target corpus compared with a reference corpus) they contain. To facilitate this analysis, we have developed a multiplatform, freeware software tool called ProtAnt (Anthony & Baker 2015) that analyses the texts, generates a ranked list of keywords based on statistical significance and effect size, and orders the texts by the number of keywords in them.

Text prototype selection methods
ProtAnt analytic tool
Experiment 1
20 Football Football Football Football Art
Experiment 2
Experiment 3
Experiment 4
Experiment 5
Discussion and conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.