Abstract

Recent studies of evolution at molecular level address two important issues: reconstruction of the evolutionary relationships between species and investigation of the forces of the evolutionary process. Both issues experienced an explosive growth in the last two decades due to massive generation of genomic data, novel statistical methods and computational approaches to process and analyze this large volume of data. Most experiments in molecular evolution are based on computing intensive simulations preceded by other computation tools and post-processed by computing validators. All these tools can be modeled as scientific workflows to improve the experiment management while capturing provenance data. However, these evolutionary analyses experiments are very complex and may execute for weeks. These workflows need to be executed in parallel in High Performance Computing (HPC) environments such as clouds. Clouds are becoming adopted for bioinformatics experiments due to its characteristics, such as, elasticity and availability. Clouds are evolving into HPC environments. In this paper, we introduce SciEvol, a bioinformatics scientific workflow for molecular evolution reconstruction that aims at inferring evolutionary relationships (i.e. to detect positive Darwinian selection) on genomic data. SciEvol is designed and implemented to execute in parallel over the clouds using SciCumulus workflow engine. Our experiments show that SciEvol can help scientists by enabling the reconstruction of evolutionary relationships using the cloud environment. Results present performance improvements of up to 94.64% in the execution time when compared to the sequential execution, which drops from around 10 days to 12 hours.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call