Facilitating intermediate node discovery for decentralized offloading in High Performance Computing centers

Benjamin A Schmidt,Ali R Butt

doi:10.1109/secon.2009.5174092

Abstract

Modern high-performance computing applications use large scale simulations to facilitate scientific discovery, such as studying the impact of sub-atomic interactions or searching for a cure for diseases. These applications increasingly use data that is growing exponentially in size. Thus, management of data at high performance computing (HPC) centers is a critical problem, and addressing data-related issues is considered a major step towards realization of efficient resource usage. Result-data offloading is a promising technique that can improve efficiency of HPC centers by moving the application result data quickly to user-specified remote locations. This also increases the overall center serviceability. However, identifying suitable remote locations for use in such decentralized offloading remains an open problem. In this paper, we explore several methods for locating intermediate nodes using peer-to-peer techniques. We facilitate node discovery at each level of the offload, and structure the discovered nodes to support efficient data transfer that can satisfy the Service Level Agreements between the HPC center and the job submission site. Our evaluation, using realistic simulations and actual measurements on the PlanetLab distributed test-bed, shows that, compared to a naive random discovery, controlled routing-table-based advertisements offer an efficient and effective method for discovering appropriate resources: it discovers 211% more nodes in total, and achieves quick discovery by finding 184% more nodes in less than 27% of the time compared to a random-broadcast based approach. Thus, this work provides promising node discovering mechanisms that can facilitate the HPC data offloading process.

Full Text