Abstract

ABSTRACT Political communication has become one of the central arenas of innovation in the application of automated analysis approaches to ever-growing quantities of digitized texts. However, although researchers routinely and conveniently resort to certain forms of human coding to validate the results derived from automated procedures, in practice the actual “quality assurance” of such a “gold standard” often goes unchecked. Contemporary practices of validation via manual annotations are far from being acknowledged as best practices in the literature, and the reporting and interpretation of validation procedures differ greatly. We systematically assess the connection between the quality of human judgment in manual annotations and the relative performance evaluations of automated procedures against true standards by relying on large-scale Monte Carlo simulations. The results from the simulations confirm that there is a substantially greater risk of a researcher reaching an incorrect conclusion regarding the performance of automated procedures when the quality of manual annotations used for validation is not properly ensured. Our contribution should therefore be regarded as a call for the systematic application of high-quality manual validation materials in any political communication study, drawing on automated text analysis procedures.

Highlights

  • With the growing popularity of such “text-as-data” approaches within the field of political communication, the issue of ensuring the validity of the results has become crucial

  • Our contribution should be regarded as a call for the systematic application of high-quality human coding for validation procedures in automated content analysis

  • Following the standard techniques often employed in political communication research, we define “automated content analysis” as a collection of contentanalytic approaches that utilize automated methods to code a large amount of textual data, in a way that the coding itself is not performed manually, but rather

Read more

Summary

Introduction

With the growing popularity of such “text-as-data” approaches within the field of political communication, the issue of ensuring the validity of the results has become crucial. To arrive at valid results, text-as-data approaches require proper triangulation of the applied techniques against some “gold standard” or “ground truth” (as some forms of “objective,” or intersubjectively valid, measurements that serve as a reference: Grimmer & Stewart, 2013) This is typically achieved by using human inputs (“human coding” or “manual annotations”) as a benchmark. Following the standard techniques often employed in political communication research, we define “automated content analysis” (or automated text analysis) as a collection of contentanalytic approaches that utilize automated methods to code a large amount of textual data, in a way that the coding itself (e.g., the text classification) is not performed manually, but rather. As exemplified in Scharkow (2013) or in Burscher, Odijk, Vliegenthart, De Rijke, and De Vreese (2014), a similar approach can be taken for SML methods in evaluating the performance of an algorithm. convergent validity against external standards is not the only criterion for establishing the validity of content analytic methods (Krippendorff, 2013), this practice of utilizing human coding in validation primarily owes to the general motivation behind automated approaches (i.e., automating “human coding”; Grimmer & Stewart, 2013), which implies a clear standard for evaluation (i.e., against human coding). due to inherent resource constraints, it is rare to see validation occur after such classification tasks in practice (for notable exceptions, see the aforementioned studies).

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.