From Content to Context

G Shankaranarayanan,Roger Blake

doi:10.1145/2996198

Abstract

Research in data and information quality has made significant strides over the last 20 years. It has become a unified body of knowledge incorporating techniques, methods, and applications from a variety of disciplines including information systems, computer science, operations management, organizational behavior, psychology, and statistics. With organizations viewing “Big Data”, social media data, data-driven decision-making, and analytics as critical, data quality has never been more important. We believe that data quality research is reaching the threshold of significant growth and a metamorphosis from focusing on measuring and assessing data quality—content—toward a focus on usage and context. At this stage, it is vital to understand the identity of this research area in order to recognize its current state and to effectively identify an increasing number of research opportunities within. Using Latent Semantic Analysis (LSA) to analyze the abstracts of 972 peer-reviewed journal and conference articles published over the past 20 years, this article contributes by identifying the core topics and themes that define the identity of data quality research. It further explores their trends over time, pointing to the data quality dimensions that have—and have not—been well-studied, and offering insights into topics that may provide significant opportunities in this area.

Full Text