Data trajectories: tracking reuse of published data for transitive credit attribution

Paolo Missier

doi:10.2218/ijdc.v11i1.425

Abstract

The ability to measure the use and impact of published data sets is key to the success of the open data/open science paradigm. A direct measure of impact would require tracking data (re)use in the wild, which is difficult to achieve. This is therefore commonly replaced by simpler metrics based on data download and citation counts. In this paper we describe a scenario where it is possible to track the trajectory of a dataset after its publication, and show how this enables the design of accurate models for ascribing credit to data originators. A Data Trajectory (DT) is a graph that encodes knowledge of how, by whom, and in which context data has been re-used, possibly after several generations. We provide a theoretical model of DTs that is grounded in the W3C PROV data model for provenance, and we show how DTs can be used to automatically propagate a fraction of the credit associated with transitively derived datasets, back to original data contributors. We also show this model of transitive credit in action by means of a Data Reuse Simulator. In the longer term, our ultimate hope is that credit models based on direct measures of data reuse will provide further incentives to data publication. We conclude by outlining a research agenda to address the hard questions of creating, collecting, and using DTs systematically across a large number of data reuse instances in the wild.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Digital Curation	Publication Date: Sep 29, 2016
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Data trajectories: tracking reuse of published data for transitive credit attribution

Abstract

Talk to us

Similar Papers

More From: International Journal of Digital Curation

Lead the way for us

Similar Papers

Toward alternative metrics of journal impact: A comparison of download and citation data
Johan Bollen ... Rick Luce
Information Processing & Management | VOL. 41
Johan Bollen, et. al.Johan Bollen ... Rick Luce
13 Jun 2005
Information Processing & Management | VOL. 41

Weak Correlation Between Circulation and Citation Numbers Suggests that both Data Points should be Considered when Deselecting Print Monographs
Melissa Goertzen
Evidence Based Library and Information Practice | VOL. 14
Melissa GoertzenMelissa Goertzen
12 Dec 2019
Evidence Based Library and Information Practice | VOL. 14

Comment from the editor
Jonathan A Cohn
Gastroenterology | VOL. 127
Jonathan A CohnJonathan A Cohn
01 Jul 2004
Gastroenterology | VOL. 127

Challenging the orthodox: a decade of critical perspectives on international business
Joanne Roberts ... Christoph Dörrenbächer
critical perspectives on international business | VOL. 10
Joanne Roberts, et. al.Joanne Roberts ... Christoph Dörrenbächer
25 Feb 2014
critical perspectives on international business | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data trajectories: tracking reuse of published data for transitive credit attribution

Abstract

Talk to us

Similar Papers

More From: International Journal of Digital Curation