Abstract

BackgroundTransposons are "jumping genes" that account for large quantities of repetitive content in genomes. They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposons are related to microRNAs and viruses, and many genes, pseudogenes, and gene promoters are derived from transposons or have origins in transposon-induced duplication. Modeling transposon-derived genomic content is difficult because they are poorly conserved. Profile hidden Markov models (profile HMMs), widely used for protein sequence family modeling, are rarely used for modeling DNA sequence families. The algorithm commonly used to estimate the parameters of profile HMMs, Baum-Welch, is prone to prematurely converge to local optima. The DNA domain is especially problematic for the Baum-Welch algorithm, since it has only four letters as opposed to the twenty residues of the amino acid alphabet.ResultsWe demonstrate with a simulation study and with an application to modeling the MIR family of transposons that two recently introduced methods, Conditional Baum-Welch and Dynamic Model Surgery, achieve better estimates of the parameters of profile HMMs across a range of conditions.ConclusionsWe argue that these new algorithms expand the range of potential applications of profile HMMs to many important DNA sequence family modeling problems, including that of searching for and modeling the virus-like transposons that are found in all known genomes.

Highlights

  • Transposons are “jumping genes” that account for large quantities of repetitive content in genomes

  • In [39], we demonstrated the effectiveness of the Conditional Baum-Welch (CBW) and the Dynamic Model Surgery (DMS) algorithms using simulated data

  • We randomly generated profile hidden Markov model (HMM) for each of the seven conservation levels in the range from .3 to .9. From each of these “true” profile HMM models we drew a set of sequences; by design the set of sequences drawn from a profile with conservation level .5 are about 50% conserved

Read more

Summary

Introduction

Transposons are “jumping genes” that account for large quantities of repetitive content in genomes They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposable elements (transposons) are genomic sequences that either directly encode the mechanism of their own duplication within a genome, or that appropriate a protein product from the cell or another transposable element to achieve mobility. These “jumping genes” share features and origins with viruses, though they differ from viruses in that they are usually unable to leave one cell to affect another [1]. Another large fraction of mammalian genomes is probably transposon-derived, but has mutated to an extent that it is unidentifiable by the current approaches [3,5]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call