Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes

Stephen Winters-Hilt,Andrew Lewis

doi:10.3390/informatics4010003

Stephen Winters-Hilt, Andrew Lewis

Open Access

https://doi.org/10.3390/informatics4010003

Copy DOI

Abstract

One of the main limitations of the typical hidden Markov model (HMM) implementation for gene structure identification is that a single structure is identified on a given sequence of genomic data—i.e., identification of overlapping structure is not directly possible, and certainly not possible within the confines of the optimal Viterbi path evaluation. This is a huge limitation given that we now know that significant portions of eukaryotic genomes, particularly mammalian genomes, are alternatively spliced, and, thus, have overlapping structure in the sense of the mRNA transcripts that result. Using the general meta-state HMM approach developed in prior work, however, more than one ‘track’ of annotation can be accommodated, thereby allowing a direct implementation of an alternative-splice gene-structure identifier. In this paper we examine the representation of alternative splicing annotation in the multi-track context, and show that the proliferation on states is manageable, and has sufficient statistical support on the genomes examined (human, mouse, worm, and fly) that a full alt-splice meta-state HMM gene finder can be implemented with sufficient statistical support. In the process of performing the alternative splicing analysis on alt-splice event counts we expected to see an increase in alternative splicing complexity as the organism becomes more complex, and this is seen with the percentage of genes with alt-splice variants increasing from worm to fly to the mammalian genomes (mouse and human). Of particular note is an increase in alternative splicing variants at the start and end of coding with the more complex organisms studied (mouse and human), indicating rapid new first and last exon recruitment that is possibly spliceosome mediated. This suggests that spliceosome-mediated refinements (acceleration) of gene structure variation and selection, with increasing levels of sophistication, has occurred in eukaryotes and in mammals especially.

Highlights

Computational gene-finding work began to make significant advances in the 1980s [1,2,3], especially upon introduction of hidden Markov models (HMMs), both in statistics intrinsic to the genome under study [1,2,3], and in analysis involving statistics extrinsic to the genome using sequence similarity/alignments methods [4]
In this paper we examine the representation of alternative splicing annotation in the multi-track context, and show that the proliferation on states is manageable, and has sufficient statistical support on the Genbank annotated genomes examined that a full alt-splice meta-state HMM gene finder can be implemented using an analysis only based on the intrinsic statistical information of the genome studied
12 are are shown shown the the results results for the the different different types of ofin alternative splicing alternative splicing (j0j0) and encumbered splicing to incr described, counts on start-of-coding, with with extentextent of alternative splicingsplicing in the genome described, along alongwith withthethe counts on start-of-coding, of alternative in the increase in j0i0 and j020 in more complex mammalian genomes, like mouse and captured in terms of ratio alternative splicing events to the number start-of-coding events

Summary

Introduction

Computational gene-finding work began to make significant advances in the 1980s [1,2,3], especially upon introduction of hidden Markov models (HMMs), both in statistics intrinsic to the genome under study (ab initio gene-finding) [1,2,3], and in analysis involving statistics extrinsic to the genome using sequence similarity/alignments methods (e.g., homology or expressed sequence tag, EST, matching with finite state automata, ‘FSAs’) [4]. The main drawback of homology-based approaches is that they cannot find new genes if they are significantly different from the known gene sequences in the known-gene databases, as discussed in [1], and explored in [7]. This is a significant limitation to purely homology-based approaches since approximately half of the genes in a particular eukaryotic genome appear to be novel to that genome (such as for C. elegans). A brief background is given for the standard first order of HMM, followed by background its its meta-state. Welch algorithms is given [11])

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Informatics (MDPI)	Publication Date: Jan 12, 2017
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Informatics (MDPI)

Lead the way for us

Similar Papers

Alternative Splicing: New Insights from Global Analyses
Benjamin J Blencowe
Cell | VOL. 126
Benjamin J BlencoweBenjamin J Blencowe
01 Jul 2006
Cell | VOL. 126

Splicing Regulation in Neurologic Disease
Donny D Licatalosi ... Robert B Darnell
Neuron | VOL. 52
Donny D Licatalosi, et. al.Donny D Licatalosi ... Robert B Darnell
01 Oct 2006
Neuron | VOL. 52

Bipartite functions of the CREB co-activators selectively direct alternative splicing or transcriptional activation
Antonio L Amelio ... Massimo Caputi
The EMBO Journal | VOL. 28
Antonio L Amelio, et. al.Antonio L Amelio ... Massimo Caputi
30 Jul 2009
The EMBO Journal | VOL. 28

Alternative splicing variability: exactly how similar are two identical cells?
Rhonda J Perriman ... Manuel Ares
Molecular systems biology | VOL. 7
Rhonda J Perriman, et. al.Rhonda J Perriman ... Manuel Ares
01 Jan 2010
Molecular systems biology | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Informatics (MDPI)