Abstract

This study focuses on the design of a real corpus extracted from a business environment. The compilation of a real corpus must be done in an objective way to extract samples of language used in an everyday context and with the minimum interference. The elaboration of a corpus based on texts from a business environment is somewhat problematic as companies are not used to providing information for linguistic research and the appropriate tagging of a real corpus is not an easy task. The tagging of a real corpus involves filtering the language selected, as it may contain some mistakes or variation. The objective of this paper is to describe the methodology followed in order to compile a real corpus and propose a tagging system of the syntactic variations found in a real corpus caused by the use of English as a second language. This proposal is made after the compilation of a corpus composed of one hundred and twenty e-mails written by Indian and Chinese employees who work in an international company. The corpus was tagged manually and several aspects were taken into account although in this paper we will focus on the tagging of variation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.