Preemptive data distribution infrastructure for data centric analysis and modelling

Ivan Lupelli,Shaun De Witt,Jonathan Hollocombe,David Muir,Rob Akers

doi:10.1016/j.fusengdes.2017.03.168

Ivan Lupelli, Shaun De Witt + Show 3 more

https://doi.org/10.1016/j.fusengdes.2017.03.168

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The next generation of tokamaks, e.g. ITER, will have extremely large data collection rates significantly larger than those experienced today in present tokamaks, with consequential new challenges in data management, data analysis and integrated modelling. One of these challenges is to ensure that appropriate data is efficiently made available when it is required and where it is consumed. Data volumes with limited network capabilities mean not all data can be distributed in time when a data-object is requested. One possible solution is to preemptively identify and distribute efficiently the data across the storage services before a user or an application requests it. Preemptive data distribution rely on analysis of historical access patterns to identify a set of rules whereby following a data-object request the most probable set of next requests can be inferred. Implementation of these rules requires the inferred sets of data to be moved close to data-object consumer. The work presented will describe the Apache Spark Machine Learning tools, the results of the analysis, and an implementation of the preemptive distribution experimental platform at CCFE, together with plans for its future integration and testing on the upcoming SAGE platform.

Full Text