Querying Google Books Ngram Viewer's Big Data Text Corpuses to Complement Research

Shalin Hai-Jew

doi:10.4018/978-1-4666-6493-7.ch020

Abstract

If qualitative and mixed methods researchers have a tradition of gleaning information from all possible sources, they may well find the Google Books Ngram Viewer and its repository of tens of millions of digitized books yet another promising data stream. This free cloud service enables easy access to big data in terms of querying the word frequency counts of a range of terms and numerical sequences (and languages) from 1500 – 2000, a 500-year span of book publishing, with new books being added continually. The data queries that may be made with this tool are virtually unanswerable otherwise. The word frequency counts provide a lagging indicator of both instances and trends, related to language usage, cultural phenomena, popularity, technological innovations, and a wide range of other insights. The text corpuses contain de-contextualized words used by the educated literati of the day sharing their knowledge in formalized texts. The enablements of the Google Books Ngram Viewer provide complementary information sourcing for designed research questions as well as free-form discovery. This tool allows downloading of the “shadowed” (masked or de-identified) extracted data for further analyses and visualizations. This chapter provides both a basic and advanced look at how to extract information from the Google Books Ngram Viewer for light research.

Full Text