Abstract

The diachronic part of the Czech National Corpus (CNC) has been organized as a general basis for the study of the entire history of Czech (from the 2nd half of the 13th century to 1990). It has been built around four principles, namely representativeness, authenticity, transcription, and preservation of maximum amount of information contained in the text. The diachronic part of the CNC includes the corpus, a bank of transcribed texts, a bank of transliterated texts, a text archive, a language database, a dictionary database, and a control database storing information about the texts. The diachronic part of CNC now includes about 1.5 million tokens.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call