Abstract

British philosopher and reformer Jeremy Bentham (1748-1832) left over 60,000 folios of unpublished manuscripts. The Bentham Project, at University College London, is creating a TEI version of the manuscripts, via crowdsourced transcription verified by experts. We present here an interface to navigate these largely unedited manuscripts, and the language technologies the corpus was enriched with to facilitate navigation, i.e Entity Linking against the DBpedia knowledge base and keyphrase extraction. The challenges of tagging a historical domain-specific corpus with a contemporary knowledge base are discussed. The concepts extracted were used to create interactive co-occurrence networks, that serve as a map for the corpus and help navigate it, along with a search index. These corpus representations were integrated in a user interface. The interface was evaluated by domain experts with satisfactory results , e.g. they found the distributional semantics methods exploited here applicable in order to assist in retrieving related passages for scholarly editing of the corpus.

Highlights

  • With the development of digital technologies, communication networks and large storage capacities, a large effort has been done to digitize all kinds of content, especially cultural heritage ones

  • Our bet is that these tools, they were initially not intended to process eighteenth-century texts, are robust enough to process non-standard texts, including philosophical ones, even if such material is expected to pose difficulties for the technology. This is the sense of the set of preliminary experiments we have done over the Transcribe Bentham manuscript collection

  • VII CONCLUSIONS AND OUTLOOK An application was presented to navigate the manuscripts of Jeremy Bentham, a 18th–19th century corpus in political philosophy, ethics and related topics

Read more

Summary

INTRODUCTION

With the development of digital technologies, communication networks and large storage capacities, a large effort has been done to digitize all kinds of content, especially cultural heritage ones. As always with this kind of projects, one needs to gain a lot of interest (the website has received nearly 100,000 visits since its beginning) to be able to recruit only a handful of very active participants These are highly motivated people, generally producing high quality work since more than 94% of the transcribed texts have been added to the database after being checked and corrected (which means less than 6% of the transcribed texts are rejected, mainly because they have been only partially transcribed). Our bet is that these tools, they were initially not intended to process eighteenth-century texts, are robust enough to process non-standard texts, including philosophical ones, even if such material is expected to pose difficulties for the technology This is the sense of the set of preliminary experiments we have done over the Transcribe Bentham manuscript collection.

THE CORPUS
Corpus sample in this study
OUR APPROACH
USER INTERFACE
USER INTERFACE EVALUATION WITH EXPERTS
CONCLUSIONS AND OUTLOOK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call