Abstract

Abstract There is a lack of concrete knowledge about floristic change in Britain before the mid-20th century. Relevant evidence is available, but it is principally contained in disparate historical sources. In this article, we demonstrate how such sources can be efficiently collated and analysed through the implementation of state-of-the-art computational-linguistic and historical-geographic information systems (GIS) techniques. We do so through a case study that focuses on the floristic history of the English Lake District. This region has been selected because of its outstanding cultural and environmental value and because it has been extensively and continuously documented since the late-17th century. We outline how natural language processing (NLP) techniques can be integrated with Kew’s Plants of the World Online database to enable temporal shifts in plant-naming conventions to be more accurately traced across a heterogeneous corpus of texts published between 1682 and 1904. Through collocate analysis and automated geoparsing techniques, the geographies associated with these plant names are then identified and extracted. Finally, we use GIS to demonstrate the potential of this data set for geo-temporal analysis and for revealing the historical distribution of Lake District flora. In outlining our methodology, this article indicates how the spatial and digital humanities can benefit research both in environmental history and in the environmental sciences more widely.

Highlights

  • The English Lake District has long been regarded as a place of outstanding cultural and environmental importance

  • These questions are of interest for historians and environmental scientists, but they are important for heritage and conservation organisations in the region, including the National Trust, Natural England and the Lake District National Park Authority, who are under mounting pressure to preserve, to 52 protect and to restore the Lake District’s historical environmental character

  • The method we present combines techniques from the digital and spatial humanities, including Natural Language Processing (NLP), Named Entity Recognition (NER), corpus linguistics (CL) and Geographic Information Systems (GIS)

Read more

Summary

Introduction

The English Lake District has long been regarded as a place of outstanding cultural and environmental importance. The method we present combines techniques from the digital and spatial humanities, including Natural Language Processing (NLP), Named Entity Recognition (NER), corpus linguistics (CL) and Geographic Information Systems (GIS) These techniques have been shown to be effective in guiding the investigation and interrogation of geospatial themes across historical textual corpora The methodology we outline and the dataset we derived from the corpus contains information about 802 plant species, 510 (63.5%) of which are linked to locations within the boundary of the Lake District National Park. We have increased and enriched the historical knowledge available to organisations, including members of the LDWHSP, who are directly involved in landscape management and policy decisions in the region

The natural environment as recorded in historical accounts
Correlating empirical evidence across historical sources
Using digitised material and source selection
Historic name variations: plant species synonyms
Forming the historical plant list
Forming and formatting the corpus
Extracting species and geographical locations from the corpus
The impact of historical synonyms on match instances
The impact of plant recording practices on match rate
Potential limitations of the computational methodology
Mapping extracted information
A Botanical Arrangement of British Plants
Literature and Science
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call