Abstract
The National Computational Infrastructure (NCI) manages over 10 PB research data, which is co-located with the high performance computer (Raijin) and an HPC class 3000 core OpenStack cloud system (Tenjin). In support of this integrated High Performance Computing/High Performance Data (HPC/HPD) infrastructure, NCI’s data management practices includes building catalogues, DOI minting, data curation, data publishing, and data delivery through a variety of data services. The metadata catalogues, DOIs, THREDDS, and Vocabularies, all use different Uniform Resource Locator (URL) styles. A Persistent IDentifier (PID) service provides an important utility to manage URLs in a consistent, controlled and monitored manner to support the robustness of our national ‘Big Data’ infrastructure. In this paper we demonstrate NCI’s approach of utilising the NCI’s PID Service to consistently manage its persistent identifiers with various applications.
Highlights
Persistent identifiers are an integral part of semantic web and Linked Data applications, which the National Computational Infrastructure (NCI) uses as a platform for metadata interoperability across multiple systems
The NCI uses a tool known as the Persistent IDentifier (PID) Service (Golodoniuc et al, 2015) to manage the Uniform Resource Identifier (URI)-based persistent identifiers for digital objects such as datasets in catalogues
In this way a URI such as http://pid.nci.org.au/dataset/1234 would be passed on to an underlying system that would be able to resolve the item with ID 1234
Summary
NCI data management provides various Uniform Resource Locator URLs for users to query databases, access datasets through different service endpoints. URLs themselves are often fragile and suffer broken links if files are relocated. It becomes unmanageable when URLs are released and used for references that later become broken. When this happens, it is not practical to inform all users and ask them to update the URLs. What is worse, for most use cases, we do not know who is using our URLs, as many of our data collections are available via open access and do not require authentication or acknowledgement of the license. PIDs are scalable so that the mapping between URLs and PIDs can be managed through a programmatic approach
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.