Abstract

In lexical bundle research, it has been a common practice to extract and compare lexical bundles across different corpora based on certain identification thresholds. This line of study adopts varying frequency and dispersion thresholds because the corpora compared always differ in the sizes and/or the numbers of texts. However, few studies have ever considered the consequences of these methodological differences. To bridge the gap, a series of experiments were conducted to explore the impact of identification thresholds and corpus composition on bundle extraction and the results of cross-corpora comparison. The first set of experiments demonstrated that different identification thresholds applied to the same pair of corpora may yield conflicting results, which indicated that the methodological differences could be one source of mixed results in the literature. Further, after removing the influence of differences in the sizes and/or the numbers of texts, the second set of experiments revealed that increasing the dispersion thresholds proportionally to offset the differences in the numbers of texts actually favours the corpus with a smaller number of texts. This study highlighted the interactive relationship between frequency thresholds and dispersion thresholds and the key role of dispersion thresholds in filtering bundles. The article also discusses the methodological implications for future contrastive lexical bundle research.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.