Abstract

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.

Highlights

  • The function of a newly sequenced gene can be discovered by determining its sequence homology with a known protein or family of proteins

  • The motivation to adopt a three-tier architecture for WImpiBLAST derives from the fact that administrators should be able to install and configure the web portal as a separate component on top of an high performance computing (HPC) cluster without drastically changing the software or hardware configurations, while users can use it without having to learn the details of the application

  • Tests were performed under ideal system load when no other compute-intensive jobs were running on the symmetric multiprocessor (SMP) server or HPC cluster in order to assess the best performance gain achieved by the application

Read more

Summary

Introduction

The function of a newly sequenced gene can be discovered by determining its sequence homology with a known protein or family of proteins. The Basic Local Alignment Search Tool (BLAST) is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences [1]. The chances of determining the function of new sequences are increasing every day with the continual unprecedented growth in size of DNA and amino acid databases. Functional annotation of the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive if done on standalone desktop or server machines, and will take days to obtain complete results

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call