Abstract

Phraseological units in academic English texts have been a central focus in recent corpus linguistic research. This paper describes a special category of clause-level phraseological units, namely, Characteristic Sentence Stems (CSSs), with a view to describing their identifying criteria and their extraction method. CSSs are contiguous lexico-grammatical sequences which contain a subject-predicate structure and which are frame expressions characteristic of academic writing. The extraction method of a CSS consists of six steps: POS tagging, n-gram segmentation, structure identification, significance of occurrence calculation, text range calculation, and overlapping sequence reduction. The significance of occurrence calculation is the crux of this method. It includes the computing of both the internal association and the boundary independence of a CSS, and it tests the occurring significance of the CSS from both the inside and the outside perspectives. Our methods and results suggest that CSSs can be statistically defined and extracted from corpora and can employed in large-scale studies to more fully account for the phraseological features of non-native English academic writing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.