Biological Sequence Databases Research Articles

BackgroundAdvances in sequencing efficiency have vastly increased the sizes of biological sequence databases, including many thousands of genome-sequenced species. The BLAST algorithm remains the main search engine for retrieving sequence information, and must consequently handle data on an unprecedented scale. This has been possible due to high-performance computers and parallel processing. However, the raw BLAST output from contemporary searches involving thousands of queries becomes ill-suited for direct human processing. Few programs attempt to directly visualize and interpret BLAST output; those that do often provide a mere basic structuring of BLAST data.ResultsHere we present a bioinformatics application named BLASTGrabber suitable for high-throughput sequencing analysis. BLASTGrabber, being implemented as a Java application, is OS-independent and includes a user friendly graphical user interface. Text or XML-formatted BLAST output files can be directly imported, displayed and categorized based on BLAST statistics. Query names and FASTA headers can be analysed by text-mining. In addition to visualizing sequence alignments, BLAST data can be ordered as an interactive taxonomy tree. All modes of analysis support selection, export and storage of data. A Java interface-based plugin structure facilitates the addition of customized third party functionality.ConclusionThe BLASTGrabber application introduces new ways of visualizing and analysing massive BLAST output data by integrating taxonomy identification, text mining capabilities and generic multi-dimensional rendering of BLAST hits. The program aims at a non-expert audience in terms of computer skills; the combination of new functionalities makes the program flexible and useful for a broad range of operations.

BackgroundGene expression arrays are valuable and widely used tools for biomedical research. Today's commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases.ResultsWe have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor.ConclusionsConsistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.

Biological Sequence Databases Research Articles

Related Topics

Articles published on Biological Sequence Databases

Strategies to improve usability and preserve accuracy in biological sequence databases.

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers

A Hybrid Parallel Implementation of the Aho–Corasick and Wu–Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

High Performance Pattern Matching on Heterogeneous Platform

BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data.

Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes

An improved distance matrix computation algorithm for multicore clusters.

Fast and Efficient Hashing for Sequence Similarity Search using Substring Extraction in DNA Sequence Databases

Increasing Efficiency of Computation Time For Hit Detection In BLASTN

EASER: Ensembl Easy Sequence Retriever

Molecular cloning and characterization of porcine indoleamine 2, 3-dioxygenase and its expression in various tissues

On compressing and indexing repetitive sequences

Taxonomic classification of metagenomic shotgun sequences with CARMA3

A Fast Hybrid Algorithm Approach for the Exact String Matching Problem Via Berry Ravindran and Alpha Skip Search Algorithms

The DNA bank network: the start from a german initiative.

Consistent annotation of gene expression arrays

Enabling HMMER for the Grid with COMP Superscalar

Estimating the Gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Biological Sequence Databases Research Articles

Related Topics

Articles published on Biological Sequence Databases

Strategies to improve usability and preserve accuracy in biological sequence databases.

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers

A Hybrid Parallel Implementation of the Aho–Corasick and Wu–Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

High Performance Pattern Matching on Heterogeneous Platform

BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data.

Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes

An improved distance matrix computation algorithm for multicore clusters.

Fast and Efficient Hashing for Sequence Similarity Search using Substring Extraction in DNA Sequence Databases

Increasing Efficiency of Computation Time For Hit Detection In BLASTN

EASER: Ensembl Easy Sequence Retriever

Molecular cloning and characterization of porcine indoleamine 2, 3-dioxygenase and its expression in various tissues

On compressing and indexing repetitive sequences

Taxonomic classification of metagenomic shotgun sequences with CARMA3

A Fast Hybrid Algorithm Approach for the Exact String Matching Problem Via Berry Ravindran and Alpha Skip Search Algorithms

The DNA bank network: the start from a german initiative.

Consistent annotation of gene expression arrays

Enabling HMMER for the Grid with COMP Superscalar

Estimating the Gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times