Abstract

Background -Sequencing of EST and BAC end datasets is no longer limited to large research groups. Drops in per-base pricing have made high throughput sequencing accessible to individual investigators. However, there are few options available which provide a free and user-friendly solution to the BLAST result storage and data mining needs of biologists.Results -Here we describe NuclearBLAST, a batch BLAST analysis, storage and management system designed for the biologist. It is a wrapper for NCBI BLAST which provides a user-friendly web interface which includes a request wizard and the ability to view and mine the results. All BLAST results are stored in a MySQL database which allows for more advanced data-mining through supplied command-line utilities or direct database access. NuclearBLAST can be installed on a single machine or clustered amongst a number of machines to improve analysis throughput. NuclearBLAST provides a platform which eases data-mining of multiple BLAST results. With the supplied scripts, the program can export data into a spreadsheet-friendly format, automatically assign Gene Ontology terms to sequences and provide bi-directional best hits between two datasets. Users with SQL experience can use the database to ask even more complex questions and extract any subset of data they require.Conclusion -This tool provides a user-friendly interface for requesting, viewing and mining of BLAST results which makes the management and data-mining of large sets of BLAST analyses tractable to biologists.

Highlights

  • The number of research groups generating sequence data such as expressed sequence tags (EST) and bacterial artificial chromosome (BAC) ends has increased dramatically due to dropping costs of obtaining DNA sequence

  • When importing a sequence data set into NuclearBLAST, a user specifies whether the sequence may be used in subsequent searches as a query, a target, or both

  • Results of a completed batch BLAST job can be browsed as ordered by e-value in a paginated fashion with the top hit available on the main result page (Fig 2b)

Read more

Summary

Introduction

The number of research groups generating sequence data such as expressed sequence tags (EST) and bacterial artificial chromosome (BAC) ends has increased dramatically due to dropping costs of obtaining DNA sequence. The primary design goals for NuclearBLAST were to provide biologists a centralized system where BLAST results can be created and retrieved, a relational database storage system which can be mined for comparative analyses, and a program which would take advantage of clustered computing resources to increase the throughput of large BLAST jobs. This client server design of the program allows for the use of multiple computers performing the BLAST analyses in a clustered environment by using a job management software package such as PBS (Portable Batch System) [6].

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.