Abstract

This article illustrates how mathematical and statistical tools designed to handle relational data may be useful to help decipher the most important features and defects of a large historical database and to gain knowledge about a corpus made of several thousand documents. Such a relational model is generally enough to address a wide variety of problems, including most databases containing relational tables. In mathematics, it is referred to as a network or a graph. The article's purpose is to emphasise how a relevant relational model of a historical corpus can serve as a theoretical framework which makes available automatic data mining methods designed for graphs. By such methods, for one thing, consistency checking can be performed so as to extract possible transcription errors or interpretation errors during the transcription automatically. Moreover, when the database is so large that a human being is unable to gain much knowledge by even an exhaustive manual exploration, relational data mining can help elucidate the database's main features. First, the macroscopic structure of the relations between entities can be emphasised with the help of network summaries automatically produced by classification methods. A complementary point of view is obtained via local summaries of the relation structure: a set of network-related indicators can be calculated for each entity, singling out, for instance, highly connected entities. Finally, visualisation methods dedicated to graphs can be used to give the user an intuitive understanding of the database. Additional information can be superimposed on such network visualisations, making it possible to intuitively link the relations between entities using attributes that describe each entity. This overall approach is here illustrated with a large corpus of medieval notarial acts, containing several thousand transactions and involving a comparable number of persons.

Highlights

  • The main objective of this article is to illustrate how mathematical and statistical tools designed to handle relational data may be useful to help decipher the most important features and defects of a large historical database and to gain knowledge about a corpus made of several thousand documents

  • The term ‘graph’ should not be here understood as signifying a graphical representation, but only as the mathematical object that models this relational data. Such a relational model is general enough to address a wide variety of problems

  • This chapter has presented a network model and associated data mining tools for the exploration of a large database built from a corpus of medieval notarial acts

Read more

Summary

Introduction

The main objective of this article is to illustrate how mathematical and statistical tools designed to handle relational data may be useful to help decipher the most important features and defects of a large historical database and to gain knowledge about a corpus made of several thousand documents. Networks are used more and more frequently (see Rose (2011) or the numerous references on the research platform https://oeaw.academia.edu/TopographiesofEntanglements for examples of the use of networks in History or Bertrand et al (2011), Lemercier (2012), for a general discussion on this topic) but most of these studies use the network as a convenient and intuitive way to represent a set of interactions that are almost exclusively social interactions between people or countries They remain generally unaware of the available mathematical tools that can help gain a clearer understanding, once the model is built. We end with a conclusion summarizing the benefits of the methodology

Data description and modelling
Global network analysis
Transaction dates
Local network analysis
Information propagation
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.