Abstract
This paper presents a system to predict future data popularity for data-intensive systems, such as the ATLAS distributed data management (DDM). Using these predictions it is possible to improve the distribution of data, helping to reduce waiting times for jobs using this data. This system is based on a tracer infrastructure that is able to monitor and store historical data accesses, which is then used to create popularity reports. These reports provide a summary of data accesses in the past, including information about the accessed files, the involved users and the sites. From this past accesses information it is possible to make near-term forecasts of data popularity. The prediction system introduced in this paper makes use of both simple prediction methods, as well as predictions made by neural networks. The best prediction method is dependent on the type of data and the access information is carefully filtered for use in either system. The second part of the paper introduces a system that effectively places data based on the predictions. This is a two phase process: In the first phase space is freed by removing unpopular replicas; in the second new replicas for popular datasets are created. The process of creating new replicas is limited by certain constraints: there is only a limited amount of space available and the creation of replicas involve transfers that use bandwidth. Furthermore, the benefits of each replica is different. The goal is to maximise the global benefit while respecting the constraints. The final part shows the evaluation of this method using a grid simulator. The simulator is able to replay workload on different data distributions while measuring the job waiting time. We show how job waiting time can be reduced based on accurate predictions about future accesses.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.