Abstract
The next generation of tokamaks, e.g. ITER, will have extremely large data collection rates significantly larger than those experienced today in present tokamaks, with consequential new challenges in data management, data analysis and integrated modelling. One of these challenges is to ensure that appropriate data is efficiently made available when it is required and where it is consumed. Data volumes with limited network capabilities mean not all data can be distributed in time when a data-object is requested. One possible solution is to preemptively identify and distribute efficiently the data across the storage services before a user or an application requests it. Preemptive data distribution rely on analysis of historical access patterns to identify a set of rules whereby following a data-object request the most probable set of next requests can be inferred. Implementation of these rules requires the inferred sets of data to be moved close to data-object consumer. The work presented will describe the Apache Spark Machine Learning tools, the results of the analysis, and an implementation of the preemptive distribution experimental platform at CCFE, together with plans for its future integration and testing on the upcoming SAGE platform.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.