Abstract

The Compact Muon Solenoid (CMS) on the Large Hadron Collider (LHC) manage high volumes of data that currently exceeds 100PB across different sites. An important challenge of delivering data to experimenters in the CMS workflow is the data volume. An experiment data file has an average size of 2 Gigabytes, with file sizes ranging between 100 Megabytes and 20 Gigabytes. Also, a complete dataset comprises of multiple files, with the dataset files ranging from 2 Terabytes and 100 Terabytes in size. Providing fast access to datasets is an important enabler for data-intensive science research. In our work, we demonstrate a Information-Centric Networking (ICN) approach to providing fast in-network access to CMS datasets. To that end, we must first address the problem of how to store large CMS files in network caches closer to the end-users. We propose a software-defined, storage-aware routing mechanism using named data networking (NDN) to achieve this goal. Due to the inherent capacity limitations of the NDN router caches, we use software defined networking (SDN) to provide an intelligent and efficient solution for data distribution and routing across multiple NDN router caches. We demonstrate how software-defined control can be used for partitioning and distributing large CMS files based on NDN router cache-state knowledge. Further, SDN control will also configure the router forwarding strategy to retrieve CMS data from the network. Using our proposed architecture, we show that CMS dataset can be retrieved 28%–38% faster from the NDN routers caches compared to existing approaches. Lastly, we develop a prefetching mechanism to improve the transfer performance of files not available in the router's cache.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call