Abstract

Collaboration across institutional boundaries is widespread and increasing today. It depends on federations sharing data that often have governance rules or external regulations restricting their use. However, the handling of data governance rules (aka. data-use policies) remains manual, time-consuming and error-prone, limiting the rate at which collaborations can form and respond to challenges and opportunities, inhibiting citizen science and reducing data providers' trust in compliance. Using an automated system to facilitate compliance handling reduces substantially the time needed for such non-mission work, thereby accelerating collaboration and improving productivity. We present a framework, Dr.Aid, that helps individuals, organisations and federations comply with data rules, using automation to track which rules are applicable as data is passed between processes and as derived data is generated. It encodes data-governance rules using a formal language and performs reasoning on multi-input-multi-output data-flow graphs in decentralised contexts. We test its power and utility by working with users performing cyclone tracking and earthquake modelling to support mitigation and emergency response. We query standard provenance traces to detach Dr.Aid from details of the tools and systems they are using, as these inevitably vary across members of a federation and through time. We evaluate the model in three aspects by encoding real-life data-use policies from diverse fields, showing its capability for real-world usage and its advantages compared with traditional frameworks. We argue that this approach will lead to more agile, more productive and more trustworthy collaborations and show that the approach can be adopted incrementally. This, in-turn, will allow more appropriate data policies to emerge opening up new forms of collaboration.

Highlights

  • Collaboration across institutional and discipline boundaries is an increasing practice in research today, whether through tight alliances or loosely coupled federations

  • This concurs with the so-called “biggest lie on the Internet” [59] i.e. the fact that most people explicitly accept website Terms of Service and Privacy Policies without reading or understanding them. This raises the same issues and modes of failure; whenever people try to enhance their control over data usage, problems arise due information overload. Legislative approaches, such as the European General Data Protection Regulations (GDPR), bring some consistency and return control back to the data subject [1], but they do not eliminate the complexity for people, leaving them facing the issues of finding, understanding and complying with data rules unaided

  • A workaround for the reporting requirement is to use the integrity set: assign the data with the integrity tag I (D) = {acknowledge}; all users agree to properly acknowledge the data assign their processes with I (Proc) = {acknowledge}, But this requires external checks of the proper permission, and the semantics is not represented in the policy too – this is environment information outside the encoded policy

Read more

Summary

Introduction

Collaboration across institutional and discipline boundaries is an increasing practice in research today, whether through tight alliances or loosely coupled federations. Current practice for such situations often requires data users to submit applications and undergo training on security, privacy, sensitivity and ethical data management before gaining access to the data, and their results may require to be screened before they are allowed to disclose them to a wider audience Policing such systems is onerous for data providers and compliance is tedious and time consuming for researchers. The importance of data governance, data ethics and privacy has risen in recent years driven by the widespread application of machine learning [50] and the Internet of Things (IoT) [51, 79], which generate and use massive amounts of data on a daily basis This concurs with the so-called “biggest lie on the Internet” [59] i.e. the fact that most people explicitly accept website Terms of Service and Privacy Policies without reading or understanding them. Appropriate methods and practical frameworks are needed to facilitate every stakeholders’ role relating to data ethics and governance

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call