Abstract
One of the first decisions made in any research concerns the selection of an appropriate scale of analysis-are we looking out into the heavens, or down into atoms? To conceive a digital library as a collection of a million books may restrict analysis to only one level of granularity. In this article, we examine the consequences and opportunities resulting from a shift in scale, where the desired unit of interpretation is something smaller than a text: it is a keyword, a motif, or a metaphor. A million books distilled into a billion meaningful components become raw material for a history of language, literature, and thought that has never before been possible. While books herded into genres and organized by period remain irregular, idiosyncratic, and meaningful in only the most shifting and context-dependent ways, keywords or metaphors are lowest common denominators. At the semantic level-the level of words, images, and metaphors-long-term regularity and patterns emerge in collection, analysis, and taxonomy. This article follows the foregoing course of thought through three stages: first, the manual curation of a high quality database of metaphors; second, the expansion of this database through automated and human-assisted techniques; finally, the description of future experiments and opportunities for the application of machine learning, data mining, and natural language processing techniques to help find patterns and meaning concealed at this important level of granularity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.