Persistent Identifier Practice for Big Data Management at NCI

Jingbo Wang,Lesley Wyborn,Nicholas Car,Ben Evans,Kashif Gohar,Claire Trenham

doi:10.5334/dsj-2017-020

Abstract

The National Computational Infrastructure (NCI) manages over 10 PB research data, which is co-located with the high performance computer (Raijin) and an HPC class 3000 core OpenStack cloud system (Tenjin). In support of this integrated High Performance Computing/High Performance Data (HPC/HPD) infrastructure, NCI’s data management practices includes building catalogues, DOI minting, data curation, data publishing, and data delivery through a variety of data services. The metadata catalogues, DOIs, THREDDS, and Vocabularies, all use different Uniform Resource Locator (URL) styles. A Persistent IDentifier (PID) service provides an important utility to manage URLs in a consistent, controlled and monitored manner to support the robustness of our national ‘Big Data’ infrastructure. In this paper we demonstrate NCI’s approach of utilising the NCI’s PID Service to consistently manage its persistent identifiers with various applications.

Highlights

Persistent identifiers are an integral part of semantic web and Linked Data applications, which the National Computational Infrastructure (NCI) uses as a platform for metadata interoperability across multiple systems
The NCI uses a tool known as the Persistent IDentifier (PID) Service (Golodoniuc et al, 2015) to manage the Uniform Resource Identifier (URI)-based persistent identifiers for digital objects such as datasets in catalogues
In this way a URI such as http://pid.nci.org.au/dataset/1234 would be passed on to an underlying system that would be able to resolve the item with ID 1234

Summary

Motivation

NCI data management provides various Uniform Resource Locator URLs for users to query databases, access datasets through different service endpoints. URLs themselves are often fragile and suffer broken links if files are relocated. It becomes unmanageable when URLs are released and used for references that later become broken. When this happens, it is not practical to inform all users and ask them to update the URLs. What is worse, for most use cases, we do not know who is using our URLs, as many of our data collections are available via open access and do not require authentication or acknowledgement of the license. PIDs are scalable so that the mapping between URLs and PIDs can be managed through a programmatic approach

Methodology Technical Implementation

A Report

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Persistent Identifier Practice for Big Data Management at NCI

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science Journal

Lead the way for us

Journal: Data Science Journal	Publication Date: Apr 18, 2017
License type: CC BY 4.0

Similar Papers

Integrating ‘Big’ geoscience data into the petascale national environmental research interoperability platform (NERDIP): Successes and unforeseen challenges
Lesley Wyborn ... Benjamin J K Evans
-
Lesley Wyborn, et. al.Lesley Wyborn ... Benjamin J K Evans
01 Oct 2015
01 Oct 2015

Developing cutting-edge geophysical data and software infrastructure to future-proof national scale geophysical assets for 2030 computation  
Nigel Rees ... Yue Sun
-
Nigel Rees, et. al.Nigel Rees ... Yue Sun
09 Mar 2024
09 Mar 2024

Sharing digital object across data infrastructures using Named Data Networking (NDN)
Kees De Jong ... Anas Younis
-
Kees De Jong, et. al.Kees De Jong ... Anas Younis
01 May 2020
01 May 2020

Breaking HPC Barriers with the 56GbE Cloud
Muhammad Atif ... Allan Williams
Procedia Computer Science | VOL. 93
Muhammad Atif, et. al.Muhammad Atif ... Allan Williams
01 Jan 2015
Procedia Computer Science | VOL. 93

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Persistent Identifier Practice for Big Data Management at NCI

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science Journal