Abstract

Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus.

Highlights

  • Understanding the evolutionary relationships between groups of organisms has become increasingly reliant on phylogenetic analysis

  • We develop a high-availability, large-scale open reading frame (ORF) phylogenetic analysis cloud service based on virtualization technology and Hadoop

  • The results show that the proposed cloud-based analysis tool, by virtue of virtualization technology and Hadoop framework, can readily facilitate bioinformatics as a service (BaaS)

Read more

Summary

Introduction

Understanding the evolutionary relationships between groups of organisms has become increasingly reliant on phylogenetic analysis. MapReduce provides a distributed file system, the Hadoop Distributed File System (HDFS), that stores the data on compute nodes [19], enabling a very high aggregate bandwidth across the cluster. We develop a high-availability, large-scale ORF phylogenetic analysis cloud service based on virtualization technology and Hadoop. This service provides phylogenetic analyses from ORFs based on Hadoop clusters to support multiple requests. Each node in a Hadoop cluster is a virtual machine Users can upload their sequence data or files through the master node (web portal) and submit a job. The proposed cloud-based ORF phylogenetic tool is available at http://bioinfo.cs.pu.edu.tw/CloudORF/

Methods
Result n
Figure 6
Experiment
Case Study
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call