BLMT

Madhavi Ganapathiraju,Judith Klein-Seetharaman,Vijayalaxmi Manoharan

doi:10.2165/00822942-200403020-00013

BLMT

Madhavi Ganapathiraju, Judith Klein-Seetharaman + Show 1 more

https://doi.org/10.2165/00822942-200403020-00013

Copy DOI

Abstract

Statistical analysis of amino acid and nucleotide sequences, especially sequence alignment, is one of the most commonly performed tasks in modern molecular biology. However, for many tasks in bioinformatics, the requirement for the features in an alignment to be consecutive is restrictive and "n-grams" (aka k-tuples) have been used as features instead. N-grams are usually short nucleotide or amino acid sequences of length n, but the unit for a gram may be chosen arbitrarily. The n-gram concept is borrowed from language technologies where n-grams of words form the fundamental units in statistical language models. Despite the demonstrated utility of n-gram statistics for the biology domain, there is currently no publicly accessible generic tool for the efficient calculation of such statistics. Most sequence analysis tools will disregard matches because of the lack of statistical significance in finding short sequences. This article presents the integrated Biological Language Modeling Toolkit (BLMT) that allows efficient calculation of n-gram statistics for arbitrary sequence datasets. BLMT can be downloaded from http://www.cs.cmu.edu/~blmt/source and installed for standalone use on any Unix platform or Unix shell emulation such as Cygwin on the Windows platform. Specific tools and usage details are described in a "readme" file. The n-gram computations carried out by the BLMT are part of a broader set of tools borrowed from language technologies and modified for statistical analysis of biological sequences; these are available at http://flan.blm.cs.cmu.edu/.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BLMT

Abstract

Talk to us

Similar Papers

More From: Applied Bioinformatics

Lead the way for us

Journal: Applied Bioinformatics	Publication Date: Jan 1, 2004
Citations: 34

Similar Papers

Nucleotide and amino acid sequence analysis of a birnavirus isolated from penguins
D J Jackwood ... S E Sommer
Veterinary Record | VOL. 156
D J Jackwood, et. al.D J Jackwood ... S E Sommer
23 Apr 2005
Veterinary Record | VOL. 156

Molecular cloning and nucleotide sequence of a pestivirus genome, noncytopathic bovine viral diarrhea virus strain SD-1
Ruitang Deng ... Kenny V Brock
Virology | VOL. 191
Ruitang Deng, et. al.Ruitang Deng ... Kenny V Brock
01 Dec 1992
Virology | VOL. 191

Identities among actin-encoding cDNAs of the Nile tilapia (Oreochromis niloticus) and other eukaryote species revealed by nucleotide and amino acid sequence analyses
Andréia B Poletto ... Fausto Foresti
Genetics and Molecular Biology | VOL. 31
Andréia B Poletto, et. al.Andréia B Poletto ... Fausto Foresti
01 Jan 2008
Genetics and Molecular Biology | VOL. 31

A Large-scale Analysis of Human Mitochondrial DNA Sequences with Special Reference to the Population History of East Eurasian.
H Oota ... N Saitou
Anthropological Science | VOL. 110
H Oota, et. al.H Oota ... N Saitou
01 Jan 2002
Anthropological Science | VOL. 110

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BLMT

Abstract

Talk to us

Similar Papers

More From: Applied Bioinformatics