The problem of managing data was largely recognized with the advent of digital computing in the 1950s. In the early days, data management was little more than the physical storage of tapes and punch cards. Not long afterwards the organization of the data became the focus of data management, and databases were born. The concept of “data management” is credited to the Association of Data Processing Service Organizations who were concerned with training and quality assurance metrics (Foote 2022). Over the past six decades, data management has become an umbrella term that includes such concerns as data governance, the enforcement of policies around data, data storage, database management, and data cataloging. In addition to dealing with the data themselves, data management involves the curation of metadata, which includes information such as names of the data collectors, the dates and locations of data acquisition, and a persistent identifier such as a URL or DOI number. As little as a decade ago, data management for many scientists and engineers amounted to the production of reports and theses, often with appendices that presented tables of data. In some cases, these were supported by computer files of the tables. However, that practice is not feasible when large datasets are involved, so-called “big-data.” Moreover, old Visicalc, Lotus 1,2,3, or even Excel files, not to mention the media on which they were stored, are not guaranteed to be reliable long-term viable storage platforms. Today, data management invokes the use of cloud storage, data lakes and data warehouses. In 2011 the National Science Foundation began to require data management plans in proposals, and other agencies such as the U.S. EPA, the U.S. Geological Survey, U.S. Department of Energy's, and U.S. Department of Defense are similarly concerned with data archiving. In addition to funding agencies, journals are now requiring datasets to be made available on persistent platforms readily available to readers. With the advent of hydrogeology from satellites, aquifer monitoring with passive sensors, and high-resolution chemical analyses, enormous datasets will become the norm for many practitioners and clients. Sophisticated tools that utilize artificial intelligence will be employed to query the data and, for some applications, make operational decisions in real time. With massive and more diverse datasets, plus powerful data analysis tools such as Machine Learning, Data Visualization, and Exploratory Data Analysis, some have postulated that we may be entering a new paradigm for scientific discovery: the so-called “Fourth Paradigm.” This concept, attributed to Jim Gray of Microsoft (d. 2007) (Microsoft Research 2022), observed that the history of science has been characterized by three key paradigms: (1) empirical evidence (e.g., Mendel and genetics); (2) scientific theory (e.g., Einstein and physics), and more recently (3) computational science (e.g., groundwater fate and transport models). However, Gray postulated that the exponential growth of “big data” and powerful data analysis tools are now leading science into a “Fourth Paradigm,” where scientific discovery will be accelerated by data-intensive approaches (Hey et al. 2009). For those of us who have worked with data and information, the world is quickly changing. In this issue we present a snapshot of data management/analysis examples in several articles reflective of current practices in the emergent “Fourth Paradigm” of science. It may be interesting to return to this theme in 5 years and contrast the change in data management and analysis practices and how they influence the progression of scientific knowledge in the groundwater field. J. Blotevogel, Guest Editor, is at CSIRO, Adelaide, AU. C. Newell, Guest Editor, is at GSI Environmental Inc., Houston, TX. J. Meyer, Guest Editor, is at University of Iowa, Iowa City, IA. K. Karimi Askarani, Guest Editor, is at Colorado State University, Fort Collins, CO. J.F. Devlin, corresponding author, Editor-in-Chief, is at University of Kansas, Lawrence, KS; jfdgwmr@ku.edu
Read full abstract