Abstract

Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks Mehmet Balman mbalman@lbl.gov Computation Research Division Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley CA 94720 High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications’ perspective. We have investigated data transfer requirements of climate applications as a typical scientific example and evaluated how the scientific community can benefit from next generation high-bandwidth networks. We have experimented with current state-of-the-art data movement tools, and realized that there is no single solution for presetting transfer parameters for optimal use of the available bandwidth. Thus, we developed an adaptive transfer methodology for tuning and optimization in wide-area data transfers. This worked well with large files. However, typical scientific datasets may include many small files. Current file- centric data transfer protocols do not perform well with managing the transfer of small files, even when using parallel streams or concurrent transfers over high bandwidth networks. In order to overcome this problem, we develop a new block-based data movement method (in contrast to the current file-based methods) to improve data movement performance and efficiency in moving large scientific datasets that contain many small files. We implemented the new block-based data movement tool, which takes the approach of aggregating files into blocks and providing dynamic data channel management. In our work, we also realized that one of the major obstacles in use of high-bandwidth networks is the limitation in host system resources. 100Gbps is beyond the capacity of today’s commodity machine, since we need substantial amount of processing power and involvement of multiple cores to fill a 40Gbps or 100Gbps network. As a result, host system performance plays an important role in the use of high- bandwidth networks. We have conducted a large number of experiments with our new block-based method and with current available file-based data movement tools. In this white paper, we describe future research problems and challenges for efficient use of next-generation science networks, based on the lessons learnt and the experiences gained with 100Gbps network applications. This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof or The Regents of the University of California

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call