Creating the CIPRES Science Gateway for inference of large phylogenetic trees

Mark A Miller,Wayne Pfeiffer,Terri Schwartz

doi:10.1109/gce.2010.5676129

Abstract

Understanding the evolutionary history of living organisms is a central problem in biology. Until recently the ability to infer evolutionary relationships was limited by the amount of DNA sequence data available, but new DNA sequencing technologies have largely removed this limitation. As a result, DNA sequence data are readily available or obtainable for a wide spectrum of organisms, thus creating an unprecedented opportunity to explore evolutionary relationships broadly and deeply across the Tree of Life. Unfortunately, the algorithms used to infer evolutionary relationships are NP-hard, so the dramatic increase in available DNA sequence data has created a commensurate increase in the need for access to powerful computational resources. Local laptop or desktop machines are no longer viable for analysis of the larger data sets available today, and progress in the field relies upon access to large, scalable high-performance computing resources. This paper describes development of the CIPRES Science Gateway, a web portal designed to provide researchers with transparent access to the fastest available community codes for inference of phylogenetic relationships, and implementation of these codes on scalable computational resources. Meeting the needs of the community has included developing infrastructure to provide access, working with the community to improve existing community codes, developing infrastructure to insure the portal is scalable to the entire systematics community, and adopting strategies that make the project sustainable by the community. The CIPRES Science Gateway has allowed more than 1800 unique users to run jobs that required 2.5 million Service Units since its release in December 2009. (A Service Unit is a CPU-hour at unit priority).

Full Text