Abstract

The ability of modern HEP experiments to acquire and process unprecedented amounts of data and simulation have lead to an explosion in the volume of information that individual scientists deal with on a daily basis. Explosion has resulted in a need for individuals to generate and keep large personal analysis data sets which represent the skimmed portions of official data collections, pertaining to their specific analysis. While a significant reduction in size compared to the original data, these personal analysis and simulation sets can be many terabytes or 10s of TB in size and consist of 10s of thousands of files. When this personal data is aggregated across the many physicists in a single analysis group or experiment it can represent data volumes on par or exceeding the official production samples which require special data handling techniques to deal with effectively.In this paper we explore the changes to the Fermilab computing infrastructure and computing models which have been developed to allow experimenters to effectively manage their personal analysis data and other data that falls outside of the typically centrally managed production chains. In particular we describe the models and tools that are being used to provide the modern neutrino experiments like NOvA with storage resources that are sufficient to meet their analysis needs, without imposing specific quotas on users or groups of users. We discuss the storage mechanisms and the caching algorithms that are being used as well as the toolkits are have been developed to allow the users to easily operate with terascale+ datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call