Abstract

This position paper is based on a keynote presentation at the COLING 2016 Workshop on Language Technology for Digital Humanities in Osaka, Japan. It departs from observations about working practices in Humanities disciplines following a hermeneutic tradition of text interpretation versus the method-oriented research strategies in Computational Linguistics (CL). The respective praxeological traditions are quite different. Yet more and more researchers are willing to open up towards truly transdisciplinary collaborations, trying to exploit advanced methods from CL within research that ultimately addresses questions from the traditional Humanities disciplines and the Social Sciences. The article identifies two central workflow-related issues for this type of collaborative project in the Digital Humanities (DH) and Computational Social Science: (1) a scheduling dilemma, which affects the point in the course of the project when specifications of the core analysis task are fixed (as early as possible from the computational perspective, but as late as possible from the Humanities perspective) and (2) the subjectivity problem, which concerns the degree of intersubjective stability of the target categories of analysis. CL methodology demands high inter-annotator agreement and theory-independent categories, while the categories in hermeneutic reasoning are often tied to a particular interpretive approach (viz. a theory of literary interpretation) and may bear a non-trivial relation to a reader’s pre-understanding. Building a comprehensive methodological framework that helps overcome these issues requires considerable time and patience. The established computational methodology has to be gradually opened up to more hermeneutically oriented research questions; resources and tools for the relevant categories of analysis have to be constructed. This article does not call into question that well-targeted efforts along this path are worthwhile. Yet, it makes the following additional programmatic point regarding directions for future research: It might be fruitful to explore—in parallel—the potential lying in DH-specific variants of the concept of rapid prototyping from Software Engineering. To get an idea of how computational analysis of some aspect of text might contribute to a hermeneutic research question, a prototypical analysis model is constructed, e.g., from related data collections and analysis categories, using transfer techniques. While the initial quality of analysis may be limited, the idea of rapid probing allows scholars to explore how the analysis fits in an actual workflow on the target text data and it can thus provide early feedback for the process of refining the modeling. If the rapid probing method can indeed be incorporated in a hermeneutic framework to the satisfaction of well-disposed Humanities scholars, a swifter exploration of alternative paths of analysis would become possible. This may generate considerable additional momentum for transdisciplinary integration. It is as yet too early to point to truly Humanities-oriented examples of the proposed rapid probing technique. To nevertheless make the programmatic idea more concrete, the article uses two experimental scenarios to argue how rapid probing might help addressing the scheduling dilemma and the subjectivity problem respectively. The first scenario illustrates the transfer of complex analysis pipelines across corpora; the second one addresses rapid annotation experiments targeting character mentions in literary text.

Highlights

  • Many years of research and tool development in the fields of Natural Languages Processing (NLP) and Computational Linguistics (CL) have led to (1) the availability of numerous mature tools for text analysis in the major languages, such as lemmatizers, part-of-speech taggers, parsers, etc., and tools for specific tasks beyond linguistic annotation such as sentiment analysis, translation, purposespecific information extraction, etc

  • Computational modeling components of this kind3 are still rarely used within the core areas of the classical Humanities disciplines like Literary Studies or History, which generally take a hermeneutic approach4 to text interpretation and, textual criticism, which is aimed at the significance of a text—following Hirsch’s (1967) separation of text meaning and significance, where the latter comprises the ‘‘relationship between [the text] meaning and a person, or a conception, or a situation, or anything imaginable’’ (Hirsch 1967: 8)

  • Collaborative project experience regarding links between CL and Linguistics include for instance involvement 2007–2010 in SFB 632 Information structure: The linguistic means for structuring utterances, sentences and texts (University of Potsdam and Humboldt University Berlin, funded by Deutsche Forschungsgemeinschaft, DFG) and 2010–2018 in the DFG-funded SFB 732 Incremental Specification in Context (University of Stuttgart, in which the author led several subprojects and was deputy director 2012–2015 and director 2015–2018)

Read more

Summary

Preliminaries

Many years of research and tool development in the fields of Natural Languages Processing (NLP) and Computational Linguistics (CL) have led to (1) the availability of numerous mature tools for text analysis in the major languages, such as lemmatizers, part-of-speech taggers, parsers, etc., and tools for specific tasks beyond linguistic annotation such as sentiment analysis, translation, purposespecific information extraction, etc. Computational tools for the questions will rarely be readily available (few questions being directly correlated with the linguistic form of the text) It is not surprising when a specialist scholar keeps relying on their erudition and manual analysis rather than investing time into the development or refinement of analytical tools that may have just a one-time application.. It is not surprising when a specialist scholar keeps relying on their erudition and manual analysis rather than investing time into the development or refinement of analytical tools that may have just a one-time application.7 Given this understandably conservative tendency in the core Humanities disciplines, emerging DH fields such as Digital Literary Studies, tend to focus on questions that have not been at the center of traditional research

11 The author’s recent DH projects are the following
What this article aims to achieve
Background
The reference data-based methodology in Computational Linguistics
System adaptation driven by reference data
Working practices in the Humanities versus Computational Linguistics
The scheduling dilemma
Approaching the dilemma with systematic bottom-up resource building
An alternative strategy: rapid probing of analysis models
Illustrating rapid probing with the ‘‘Textual Emigration Analysis’’ system
The subjectivity problem
Base for illustration: point of view in narrative text
Modeling subjective categorizations
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call