Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates

Hasan Metin Aktulga,Ioannis Kontoyiannis,Ananth Y Grama,Lukasz Szpankowski,Wojciech Szpankowski,L Alex Lyznik

doi:10.1155/2007/14741

Abstract

Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, they are used for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's combined DNA index system (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats-an application of importance in genetic profiling.

Highlights

Questions of quantification, representation, and description of the overall flow of information in biosystems are of central importance in the life sciences
First we show that it can be used effectively to identify statistical dependence between regions of the maize zmSRp32 gene that may be involved in alternative processing of pre-mRNA transcripts
We present experimental results on DNA sequences from the FBI’s combined DNA index system (CODIS), which clearly indicate that the empirical mutual information can be a powerful tool for this computationally intensive task

Summary

Introduction

Representation, and description of the overall flow of information in biosystems are of central importance in the life sciences. We develop statistical tools based on information-theoretic ideas, and demonstrate their use in identifying informative parts in biomolecules. Our goal is to detect statistically dependent segments of biosequences, hoping to reveal potentially important biological phenomena. It is well known [1,2,3] that various parts of biomolecules, such as DNA, RNA, and proteins, are significantly (statistically) correlated. Formal measures and techniques for quantifying these correlations are topics of current investigation. The biological implications of these correlations are deep, and they themselves remain unresolved. We propose to develop precise and reliable methodologies for quantifying and identifying such dependencies, based on the information-theoretic notion of mutual information

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Bioinformatics and Systems Biology	Publication Date: Jan 1, 2007
Citations: 37	License type: cc-by

R Discovery Prime

R Discovery Prime

Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Bioinformatics and Systems Biology

Lead the way for us

Similar Papers

Statistical Dependence in Biological Sequences
Hasan Metin Aktulga ... L Alex Lyznik
-
Hasan Metin Aktulga, et. al.Hasan Metin Aktulga ... L Alex Lyznik
01 Jun 2007
01 Jun 2007

Heating Up Cold Cases: An Interview with Bruce Budowle on Human Identification
David Mittelman
Forensic Genomics | VOL. 1
David MittelmanDavid Mittelman
01 Mar 2021
Forensic Genomics | VOL. 1

On Accountability: Genetic Tools for Justice and Injustice in Criminal Proceedings
Emily Greenwald ... Linda Phiri
Journal of Science Policy & Governance | VOL. 25
Emily Greenwald, et. al.Emily Greenwald ... Linda Phiri
28 Oct 2024
Journal of Science Policy & Governance | VOL. 25

Alternative Splicing Factor/Splicing Factor 2 Regulates the Expression of the ζ Subunit of the Human T Cell Receptor-associated CD3 Complex
Vaishali R Moulton ... George C Tsokos
Journal of Biological Chemistry | VOL. 285
Vaishali R Moulton, et. al.Vaishali R Moulton ... George C Tsokos
01 Apr 2010
Journal of Biological Chemistry | VOL. 285

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Bioinformatics and Systems Biology