Abstract

In this study, hierarchies of probabilistic models are evaluated for their ability to characterize the untemplated addition of adenine and uracil to the 3’ ends of mitochondrial mRNAs of the human pathogen Trypanosoma brucei, and for their generative abilities to reproduce populations of these untemplated adenine/uridine “tails”. We determined the most ideal Hidden Markov Models (HMMs) for this biological system. While our HMMs were not able to generatively reproduce the length distribution of the tails, they fared better in reproducing nucleotide composition aspects of the tail populations. The HMMs robustly identified distinct states of nucleotide addition that correlate to experimentally verified tail nucleotide composition differences. However they also identified a surprising subclass of tails among the ND1 gene transcript populations that is unexpected given the current idea of sequential enzymatic action of untemplated tail addition in this system. Therefore, these models can not only be utilized to reflect biological states that we already know about, they can also identify hypotheses to be experimentally tested. Finally, our HMMs supplied a way to correct a portion of the sequencing errors present in our data. Importantly, these models constitute rare simple pedagogical examples of applied bioinformatic HMMs, due to their binary emissions.

Highlights

  • In this paper, the framework of Hidden Markov Models (HMMs) was applied to an interesting data set from molecular biology

  • Hierarchies of probabilistic models are evaluated for their ability to characterize the untemplated addition of adenine and uracil to the 3’ ends of mitochondrial mRNAs of the human pathogen Trypanosoma brucei, and for their generative abilities to reproduce populations of these untemplated adenine/uridine “tails”

  • At least some regulation of the mitochondrial transcriptome occurs at the RNA level [13], and we have previously analyzed the variation of content in the 3’ tail additions between life stages

Read more

Summary

Introduction

The framework of Hidden Markov Models (HMMs) was applied to an interesting data set from molecular biology. In addition to the previously-mentioned addition of non-templated tails consisting of A and U, most of the mitochondrial mRNAs undergo a targeted insertion and deletion of a few to hundreds of uracils (RNA editing) to generate a translatable sequence [14]. At least some regulation of the mitochondrial transcriptome occurs at the RNA level [13], and we have previously analyzed the variation of content (nucleotide length and composition) in the 3’ tail additions between life stages. Deciphering the mechanism of and roles for 3’ tail additions in T. brucei has required genetic manipulation and subsequent tracking of downstream effects such as mRNA tail composition This approach has proven hugely informative, but mechanisms of tail addition are clearly complex. They highlight a simple context for HMM that is of potential pedagogical benefit

Statistical properties of tail populations
Performance of biological system-informed HMMs of increasing complexity
HMM sequence error correction
Unstructured and ultimate HMMs
Findings
Conclusion and future application

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.