Abstract
Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus.
Highlights
Understanding the evolutionary relationships between groups of organisms has become increasingly reliant on phylogenetic analysis
We develop a high-availability, large-scale open reading frame (ORF) phylogenetic analysis cloud service based on virtualization technology and Hadoop
The results show that the proposed cloud-based analysis tool, by virtue of virtualization technology and Hadoop framework, can readily facilitate bioinformatics as a service (BaaS)
Summary
Understanding the evolutionary relationships between groups of organisms has become increasingly reliant on phylogenetic analysis. MapReduce provides a distributed file system, the Hadoop Distributed File System (HDFS), that stores the data on compute nodes [19], enabling a very high aggregate bandwidth across the cluster. We develop a high-availability, large-scale ORF phylogenetic analysis cloud service based on virtualization technology and Hadoop. This service provides phylogenetic analyses from ORFs based on Hadoop clusters to support multiple requests. Each node in a Hadoop cluster is a virtual machine Users can upload their sequence data or files through the master node (web portal) and submit a job. The proposed cloud-based ORF phylogenetic tool is available at http://bioinfo.cs.pu.edu.tw/CloudORF/
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have