DNASTAT: a Pascal unit for the statistical analysis of DNA and protein sequences.

J Kleffe,B Wittig,K Hermann,W Gunia,W Vahrson

doi:10.1093/bioinformatics/11.4.449

Abstract

DNASTAT is a collection of Pascal routines for researchers who develop their own application programs for statistical analysis of DNA and protein sequences. Dynamic and file-based data structures allow users to process sets of sequences by simple loop control without limitations on the number of sequences and their individual sizes. This frees the programmer from potentially error-prone tasks like dynamic memory allocation and controlling array sizes. Sequences can be stored in databases along with biological and statistical attributes. Individual sequences can be accessed by column name and row number as with spread-sheets. DNASTAT allows large sets of sequences to be processed using a PC with standard configuration. Its small size, simplicity and free availability make it attractive to students of mathematical biology. Use of DNASTAT is illustrated by two sample programs that generate a database of coding regions from the GenBank entry of the tobacco chloroplast genome. A version of DNASTAT written in ANSI-C for PCs and Unix workstations is also available.

Full Text