Abstract

The present article is the description of a project aimed at building a specialized corpus of Italian newspaper texts and at developing a computational technique to retrieve new false anglicisms from it. Texts were collected along a ten-month span from three Italian newspapers: La Stampa, La Repubblica, and Il Corriere della Sera. The size of the corpus is about 20 million tokens and approximately 230,000 types. The system was automatically updated on a daily basis and a list of words was obtained at the end of the collection period. This procedure originated a refined word list in which false anglicisms were searched. Along with computational techniques, careful manual scanning proved to be indispensable to extract new false anglicisms. The corpus is available for future work and may be exploited not only to find false anglicisms but also to retrieve anglicisms, neologisms, and to analyse lexical features of Italian newspaper language.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.