Abstract

The American National Corpus (ANC) will be a carefully designed corpus of 100 million words of American written and spoken language that generally follows the framework of the British National Corpus. The ANC project will provide both a standard format for text encoding and a format for different types of corpus annotation (e.g., parts of speech, rhetorical features, etc.), as well as different versions of the same type of annotation (e.g. multiple part of speech taggings). As the only widely available large corpus of spoken and written American English containing a variety of registers, the ANC will represent a synchronic slice of American English across many registers. The First Release of the ANC, described in this article, is a preview of the corpus and a chance for researchers to contribute feedback on format and related issues, while allowing them access to data rather than waiting until the entire corpus is completed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.