Abstract

This paper reviews the prototype dataset accounting developed during the EGI-Engage project and how it could be used to complement the view that the WLCG has of its datasets. This is a new feature of the EGI resource accounting system that will enable storing information on dataset usage such as who has accessed a dataset and how often. The new REST interface used for retrieving usage metrics from the EGI DataHub is described as well as further work that is required.

Highlights

  • While the Worldwide LHC Computing Grid† (WLCG) and EGI‡ have both made significant progress towards solutions for storage space accounting, one area that is still quite exploratory is that of dataset accounting

  • Considering the need, identified in our preliminary analysis, for a persistent identifier (PID) management system as a prerequisite to implementing a data accounting feature, special attention was devoted to gathering information about the usage of different methods for identifying datasets including Digital Object Identifiers (DOI) [5], Uniform Resource Identifiers (URI) and persistent Uniform Resource Locators (URL)

  • A basic dataset accounting prototype was developed that introduced a new route (REST APIs) for retrieving metrics to the APEL software system, allowing accounting data to be aggregated from a wider range of types of resource

Read more

Summary

Introduction

While the Worldwide LHC Computing Grid† (WLCG) and EGI‡ have both made significant progress towards solutions for storage space accounting, one area that is still quite exploratory is that of dataset accounting This type of accounting would enable resource centre and research community administrators to report on dataset usage to the data owners, data providers, and funding agencies. This paper reviews the status of the prototype dataset accounting developed during the EGI-Engage project and how it could be used to complement the view that the WLCG has of its datasets This is a new feature of the EGI resource accounting system that will enable storing information on dataset usage such as who has accessed a dataset and how often, the transfer volumes, and end points etc. The EGI Accounting Repository was integrated with the data provider Onedata§ (the underlying technology powering the EGI Open Data Platform and EGI DataHub) as an example of a generic data provider

Existing Resource Usage Accounting
Definitions
Objectives
Preliminary work
EGI-Engage Prototype
Transferring the data
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call