Abstract

AbstractLarge-scale digitization efforts by third-party firms are the subject of no small amount of controversy and criticism, as is especially the case with Google Books. This article reports some of the findings and important implications of a rigorous multi-year quantitative and qualitative assessment of the images representing a sizable proportion of the digital surrogates created by Google and deposited in the HathiTrust, which is one of the most important large-scale preservation initiatives to emerge in higher education in the past fifty years. The population of study described here consists of Englishlanguage books and serials published before 1923 that were scanned and processed by Google between 2004 and 2010. At the time the data for the study were gathered (2011), this population consisted of approximately 1.25 million volumes or roughly 12 percent of the HathiTrust corpus. The findings suggest that the imperfection of digital surrogates is an obvious and nearly ubiquitous feature of Google Books and that such imperfection has become and will remain firmly ensconced in collaborative preservation repositories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call