Abstract

This paper attempts to replicate the findings of the recent work, “The rise and fall of biodiversity in literature,” by Langer et al. (2021). Using a large corpus from Project Gutenberg (N = ~15,000) and a dictionary-matching method of over 240K biological taxa, Langer et al. find that the frequency and diversity of biological taxa have been declining steadily since the first half of the nineteenth century, echoing prior work in cultural analytics. This paper applies the original paper’s three primary measures to two additional data sets along with the original dataset and compares their dictionary-based method with an alternative supervised machine learning method. I find that the trajectory of biological tokens in fiction in the new data sets is directionally opposite to that shown by Langer et al. independent of the methods used (i.e. taxa rise rather than fall since the first half of the nineteenth century) but that their breakpoint estimation appears largely robust within +/- 15 years. Based on this analysis, I suggest that the discrepancy between our results is due to corpus construction rather than choice of method. I find that only conditioning on fiction in the original dataset generates results more similar to the two alternative datasets used here. In addition to emphasizing the importance of corpus construction for cultural analytics, these findings also raise larger questions about the difficulties of interpreting lexical items as indeces of social attitudes, pointing to a need for future work.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.