Revisiting the Data Lifecycle with Big Data Curation

Line Pouchard

doi:10.2218/ijdc.v10i2.342

Abstract

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research. As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity. We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

Highlights

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions
Data life cycle models present a structure for organizing the tasks and activities related to the management of data within a project or an organization
We propose a Big Data Life Cycle Model (Figure 3) that combines the perspective of research with that of data curation, identifies the tasks of data management that lead to analysis, while preserving the curation aspect, and encompasses the steps necessary to handle Big Data

Summary

Introduction

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. In the mid-1990s, the phrase appears to have been used a lot around Silicon Graphics, both in academic presentations and sales pitches to scientists, customers, analysts and press2 Around this time, an early academic definition appears in a paper found in the ACM Digital Library, as data that is too large to fit into local computer memory and is tied to the demands of computational fluid dynamics and visualization: “visualization provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk: we call this the problem of Big Data.” (Cox and Ellsworth, 1997). Data-driven decisions emphasize the need for traceability and provenance

Infographic

Findings

Conclusion and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Digital Curation	Publication Date: May 27, 2016
Citations: 28	License type: cc-by

R Discovery Prime

R Discovery Prime

Revisiting the Data Lifecycle with Big Data Curation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Digital Curation

Lead the way for us

Similar Papers

Big data curation framework: Curation actions and challenges
Ayoung Yoon ... Devan Ray Donaldson
Journal of Information Science | VOL. -
Ayoung Yoon, et. al.Ayoung Yoon ... Devan Ray Donaldson
11 Nov 2022
Journal of Information Science | VOL. -

Research Data Preservation Practices of Library and Information Science Faculties
Anuradha Maurya ... Subaveerapandiyan A
DESIDOC Journal of Library & Information Technology | VOL. 42
Anuradha Maurya, et. al.Anuradha Maurya ... Subaveerapandiyan A
19 Jul 2022
DESIDOC Journal of Library & Information Technology | VOL. 42

Curators to the stars
David S Fearon ... Laura Wynholds
Proceedings of the American Society for Information Science and Technology | VOL. 47
David S Fearon, et. al.David S Fearon ... Laura Wynholds
01 Nov 2010
Proceedings of the American Society for Information Science and Technology | VOL. 47

Artisanal and Industrial: The Different Methods of Data Creation
Sarah Callaghan
Patterns | VOL. 1
Sarah CallaghanSarah Callaghan
01 Sep 2020
Patterns | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Revisiting the Data Lifecycle with Big Data Curation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Digital Curation