Abstract

AbstractAchieving high performance for distributed I/O on a wide‐area network continues to be an elusive holy grail. Despite enhancements in network hardware as well as software stacks, achieving high‐performance remains a challenge. In this paper, our worldwide team took a completely new and non‐traditional approach to distributed I/O, called ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing, by utilizing application‐specific transformation of data to orders of magnitude smaller metadata before performing the actual I/O. Specifically, this paper details our experiences in deploying a large‐scale system to facilitate the discovery of missing genes and constructing a genome similarity tree by encapsulating the mpiBLAST sequence‐search algorithm into ParaMEDIC. The overall project involved nine computational sites spread across the U.S. and generated more than a petabyte of data that was ‘teleported’ to a large‐scale facility in Tokyo for storage. Copyright © 2010 John Wiley & Sons, Ltd.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call