Abstract

Abstract Over the past few years, advances in the field of molecular biology and genomic technologies have led to an explosive growth of digital biological information. The analysis of this large amount of data is commonly based on the extensive and repeated use of conceptually parallel algorithms, most notably in the context of sequence alignment. Cloud computing provides scientists with a completely new model of utilizing the computing infrastructure. Cloud computing model is excellent in dealing with such bioinformatics applications, which require both management of huge amounts of data and heavy computations. The study aims at transforming a recently developed bioinformatics sequence alignment tool, named BFAST, to the cloud environment. The MapReduce version of the BFAST tool will be used to demonstrate the effectiveness of the MapReduce framework and the cloud-computing model in handling the intensive computations and management of the huge bioinformatics data. A number of existing tools and technologies are utilized in this study to achieve an efficient transformation of the BFAST tool into the cloud environment. The implementation is mainly based on two core components; BFAST and MapReduce. BFAST is a software package for aligning next generation genomic reads against a target genome with a very high accuracy and reasonable speed. MapReduce general-purpose parallelization technology [in its open source implementation, Hadoop] appears to be particularly well adapted to the intensive computations and huge data storage tasks involved in the BFAST sequence alignment tool. The MapReduce version of the BFAST tool is expected to offer better results than the original one in terms of maintaining good computational efficiency, accuracy, scalability, deployment and management efforts. The study demonstrates how a general-purpose parallelization technology, i.e. MapReduce running on the cloud, can be tailored to tackle the class of bioinformatics problems with good performance and scalability, and, more importantly, how this technology could be the basis of a computational parallel platform for several problems in the context of bioinformatics. Although the effort of transforming existing bioinformatics algorithms from local compute infrastructure is not trivial, the speed and flexibility of cloud computing environments provide a substantial boost with manageable cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.