Compensation for nucleotide bias in a genome by representation as a discrete channel with noise.

Mark Schreiber,Chris Brown

doi:10.1093/bioinformatics/18.4.507

Abstract

Calculation of the information content of motifs in genomes highly biased in nucleotide composition is likely to lead to overestimates of the amount of useful information in the motif. Calculating relative information can compensate for biases, however the resulting information content is the amount seen by an observer and not by a macromolecule binding to the motif. The latter is needed to calculate the discriminatory power of the motif and to compare motifs between species. By treating a biased genome as a discrete channel with noise, in accordance with Shannon Information Theory, we were able to remove both 'Distortion' and 'Noise' from the motif and recover a more instructive biological 'signal.' A Java application, LogoPaint, was developed to remove nucleotide bias distortion and triplet frequency noise from motifs, calculate information content and present the motif as a logo. We demonstrate how this technique can 'unmask' motifs in the translation initiation regions of bacteria that are obscured by strong sequence biases. LogoPaint is available to all users from the authors as an executable JAR file. Source code is available by arrangement.

Full Text