Doing Big Data: Considering the Consequences of Writing Analytics

Eric James Stephens

doi:10.37514/jwa-j.2017.1.1.12

Abstract

Aim: This research note focuses on some of the consequences of big data as an emerging methodology. Its purpose is to provide a brief literature review of the method’s development and some of the critical questions researchers should consider as they move forward. Salvo (2012) contends that big data as a form of design of communication itself “is necessarily a rhetorically-based field” (p. 38). With big data as an up and coming methodology (McNely, 2012; Salvo, 2012), using caution in its application is a necessity for scholars. Not only should researchers seek out the unseen and untapped applications of big data, but they should learn its limitations as well (Spinuzzi, 2009). You adopt a methodology, you adopt its flaws. Problem Formation: This section identifies a gap in the field as it relates to some of the consequences of applying big data as a methodology and seeing it as a rhetorical tool. As big data gains steam in the field of humanities, some are sure to question what they see as a flaw: the act of quantifying language. This argument is not new nor is its rebuttal. Harris (1954) discusses the distributional structure of language with each part of a sentence acting as co-occurents, each in a particular position, and each with a relationship to the other co-occurents (p. 146). Salvo (2012) argues that the combination of these new methodologies and technologies “knits together invention, arrangement, style, memory, and delivery in ways that challenge conceptions of print based literacy and textuality” (p. 39). While big data itself has several rhetorical methodologies embedded within, deciding which one to use depends on the amount of data and how it’s aggregated. • Information Collection: As described above, this research note functions primarily as a brief review of literature. This section focuses on how writing analytics developed from content analysis in mass communications and shifted into latent semantic analysis assisted by computer technology. Riffe, Lacy, & Fico (1995) offer a clear explanation of content analysis, which was developed with comparably small data sets in mind: “Usually, but not always, content analysis involves drawing representative samples of content, training coders to use the category rules developed to measure or reflect differences in content, and measuring reliability (agreement or stability over time) of coders applying the rules” (p. 2). Finding a representative sample of content was once a more feasible methodology, but in the digital age that amount of content exponentially increases every day. Conclusions: As latent semantic analysis is an extension of quantitative content analysis (and vice versa)—and knowing that an adopted methodology carries adopted flaws—it makes sense to turn to some of the concerns voiced by mass communication scholars in order to understand limitations. While quantitative content analysis grew in popularity in mass communication, so did the refining of its methods. Reporting the reliability of a study adds credibility to the study itself, and when a human coder is involved, the reporting of this intercoder reliability becomes imperative (Hayes & Krippendorf, 2007; Krippendorf, 2008, 2011). While intercoder reliability measures the degree to which coders agree, researchers should also be keenly aware of the theory and valence informing their study, which impacts their coders, which ultimately impacts the results of the study itself. Directions for Further Research: As the field of writing studies begins to adopt big data methodologies, researchers must continue to challenge and question their applications, implementations, and implications, turning to familiar questions from our own fields. Big data is exciting and new, but it’s not the methodology to explain it all. It’s just as rhetorical as every other methodology—it’s just better at hiding it.

Full Text