Abstract

 
 
 Digitized historical newspapers are a treasure trove of information for our understanding of the past. As one popular application, the frequencies of query matches can be used to understand the prevalence of some discourse in a historical era. This requires the construction good queries: broad enough to capture diverse contexts and narrow enough to exclude irrelevant ones. For historical research in digital humanities, targeted queries that emphasize precision have been advised. In this paper, we develop an alternative approach, by using broad queries to cast a wider net and then using topic models built on the match contexts to filter out irrelevant matches. Specifically, we look for contexts discussing environmental issues throughout the 20th century using a corpus of two Australian newspapers. We report on a comparison of iteratively constructed narrow and broad queries and their precision and recall, and find our approach to discover roughly 7-10x more matches with a comparable level of accuracy. This combined approach can work well for focussed research projects where deliberate query construction and qualitative feedback on the results is feasible.
 
 
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Digital Humanities in the Nordic and Baltic Countries Publications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.