Abstract

Many studies in big data focus on the uses of data available to researchers, leaving without treatment data that is on the servers but of which researchers are unaware. We call this dark data, and in this article, we present and discuss it in the context of high-performance computing (HPC) facilities. To this end, we provide statistics of a major HPC facility in Europe, the High-Performance Computing Center Stuttgart (HLRS). We also propose a new position tailor-made for coping with dark data and general data management. We call it the scientific data officer (SDO) and we distinguish it from other standard positions in HPC facilities such as chief data officers, system administrators, and security officers. In order to understand the role of the SDO in HPC facilities, we discuss two kinds of responsibilities, namely, technical responsibilities and ethical responsibilities. While the former are intended to characterize the position, the latter raise concerns—and proposes solutions—to the control and authority that the SDO would acquire.

Highlights

  • Studies in big data across disciplines are interested in the analysis and uses of actual data

  • Amount, and uses of data in scientific and engineering research are rather different from social networking, business, and governmental institutions, we shall focus on dark data in high-performance computing (HPC) facilities

  • Let us mention that we characterize the scientific data officer (SDO) as part of the personnel within the HPC facility, and as such knowledgeable of the projects, past and present, carried out within the institution, as well as the ethical and legal framework adopted for those projects

Read more

Summary

Introduction

Studies in big data across disciplines are interested in the analysis and uses of actual data. Amount, and uses of data in scientific and engineering research are rather different from social networking, business, and governmental institutions, we shall focus on dark data in high-performance computing (HPC) facilities. Under ideal conditions of scientific practice, standard data management workflows in HPC facilities indicate that, in order to keep clean records of the data produced, such data must be “labeled” correctly This means that specific metadata about the data is tagged onto the data with the purpose of identification and categorization. Plays the fundamental role of structuring, informing, and identifying data by means of relevant information about them When such conditions of management workflow are not followed, and admittedly this is the case for many HPC facilities, data becomes dark, invisible, and undetectable by the researchers.

The Problem and Notion of Dark Data
The Statistics of Dark Data at the HLRS
The Scientific Data Officer
Technical Responsibilities
Selected Simulation Results
Ethical Considerations
What the SDO Is Not
Final Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call