Abstract

AbstractThe linguistic DNA project seeks to understand the evolution of philosophy, society and language during the Modern English period. Corpora like Early English Books Online (EEBO), Corpus of Late Modern English Texts (CLMET) and Corpus of Historical American English (COHA) allow us to apply statistical data‐driven models extracting patterns to confirm our expectations. As systems biology has revolutionised biology by systematically searching for all patterns, we detect patterns in our data systematically with contextual and distributional semantic approaches, an approach that could be called systems history. We uncover semantic patterns with methods from text mining, computational linguistics and digital humanities. We normalise the spelling automatically to present‐day variants and use bottom‐up analyses to step from words to concepts: collocations, topic modelling and distributional semantics. We illustrate the approaches with two case studies: associations of poverty changing across time, and Charles Dickens social criticism, his vision of helping to improve the situation of the poor. As no gold standard for our task exists, our approaches are exploratory, which entails considerable manual intervention, e.g. sifting candidate lists, reading excerpts and interpreting topic models. A fully automatic approach is currently neither feasible nor envisaged: semi‐automatic approaches give researchers the inspiring opportunity to interact with the texts in a constant move between distant and close reading. The different characteristics of the various statistical methods offer complementary perspectives.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.