Abstract
Historical archives provide invaluable insights into societies of the past, including social networks. However, the required amount of traditional archival work makes historical network studies usually small-scaled. We consider the problem of processing a large corpus of unstructured textual information to extract network data. The corpus consists of almost 170,000 documents of administrative correspondence of the Portuguese Empire, from 1610 to 1833, catalogued in the Portuguese Overseas Archives of Lisbon. Our contribution is twofold: the method and the result. Firstly, grounded in the review of manual, semi-manual and automatic methods of network data extraction from natural language corpora, we propose and demonstrate an approach using modern natural language processing algorithms. This approach tries to mimic traditional archivist’s coding practices and is applicable to large corpora of texts, for which manual coding is infeasible because of scale. We believe our approach is generic and adaptable to other substantive contexts, languages, and types of historical archives. Secondly, the dataset created is rich in additional information such as occupation, administrative affiliation, and geographical location of senders and recipients. We provide a preliminary network analysis suggesting that the dataset is an attractive material for historians and social network researchers for addressing research questions about the political and social evolution of the early modern Portuguese Empire, spanning the reign of seven Portuguese monarchs.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have