Abstract

The Australian Government has begun an initiative to organise publicly funded national data assets and make them accessible for research through the Research Data Services initiative (RDS), which supports over 40 PBytes of multidisciplinary data at eight nodes around Australia. One of these nodes is at the National Computational Infrastructure (NCI) that provides a national comprehensively integrated high performance computing facility. NCI is a partnership between the ANU, the Australian Bureau of Meteorology, Geoscience Australia (GA) and the Australian Commonwealth Science and Industry Research Organisation (CSIRO) and particularly focuses on Earth system sciences. As part of its activity in RDS, NCI has collocated over 10 PBytes of priority research data collections spanning a wide range of disciplines from geosciences, geophysics, environment, climate, weather, and water resources, through to astronomy, bioinformatics, and the social sciences. To facilitate access, maximise reuse and enable integration across the disciplines, data have been built into a platform that NCI has called, the National Environmental Research Data Interoperability Platform (NERDIP). The platform is co-located with the significant HPC resources: a 1.2 PetaFlop supercomputer (Raijin), and a HPC class 3000 core OpenStack cloud system (Tenjin). Combined, they offer unparalleled opportunities for geosciences researchers to undertake innovative Data-intensive Science at scales and resolutions never before attempted, as well as enabling participation in new collaborations in interdisciplinary science. However, compared with other ‘Big Data’ science disciplines (climate, oceans, weather, astronomy), current geoscience data management practices and data access methods need significant work to be able to scale-up and thus to take advantage of the changes in the global computing landscape. Although the geosciences have many ‘Big Data’ collections that could be incorporated within NERDIP, they typically comprise heterogeneous files that are distributed over multiple sites and sectors, and it is taking considerable time to aggregate these into large High Performance Data (HPD) sets that are structured to facilitate uptake in HPC environments. Once incorporated into NERDIP, the next challenge is to ensure that researchers are ready to both use modern tools, and to update their working practises so as to process these data effectively. This is an issue in part because the geoscience community has been slow to move to peak-class systems for Data-intensive Science and integrate with the rest of the Earth systems community.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call