Abstract

Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed “cloud computing”) has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

Highlights

  • The accumulation of DNA sequence information, comprising merely the order within a simple polymer of the four canonical bases (A, T, G, C), has suddenly exploded into the bioscientific universe, drawing comparisons to the Big Bang theory of the origin of the universe

  • The explosion in DNA sequence accumulation can be traced to developments including pyrosequencing (Franca et al, 2002), nanopore sequencing (Branton et al, 2008; Ivanov et al, 2011), single molecule sequencing (SMS) technology using DNA polymerases (Nusbaum, 2009), non-optical sequencing based on detection of pH changes (Rothberg et al, 2011), and high-throughput short-read platforms such as the Illumina Miseq and Hiseq sequencers (Caporaso et al, 2012)

  • Meta Genome Rapid Annotation using Subsystem Technology has a simple workflow; first, the data in fasta format is chunked into smaller pieces, each chunk using BLAST is searched for similarities within the database (Wilkening et al, 2009)

Read more

Summary

INTRODUCTION

The accumulation of DNA sequence information, comprising merely the order within a simple polymer of the four canonical bases (A, T, G, C), has suddenly exploded into the bioscientific universe, drawing comparisons to the Big Bang theory of the origin of the universe. Despite the development of computers consistently following Moore’s law in terms of processing speed, this aspect lags the data storage and maintenance requirements for the large amount of DNA sequence data produced by high-throughput next-generation sequencing techniques (Shendure and Ji, 2008). To cope with this situation, techniques of parallel computation using virtual hardware, software, and working platform resources have appeared and collectively termed “cloud computing.”. Given that cloud computing is emerging as a commercial reality, the following points appear to underpin the reason for its commercialization (Armbrust et al, 2009): www.frontiersin.org

Data amount
Technology support
Off Premise Private Cloud
CLUSTALW
Cost-effectiveness
Reliability and Security
Data Recovery and Management Systems
Bioinformatics and Computational Biology Problems
Metadata Management and Cloud Provenance
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.