Abstract
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
Highlights
Due to tremendous progresses in terms of efficiency, accuracy and cost for the high-throughput sequencing technologies, a large number of genome sequences of eukaryotic, prokaryotic and archaea organisms are increasingly becoming available [1,2]
By contrast to experimental investigations on biological functions, the in silico analysis of DNA sequences is essential in post-genomic era
According to evolutionary origins and genomic distribution, repetitive DNA sequences could be overall classified into three types [14, 15], including the tandem repeats, interspersed repeats, and long terminal repeats (LTRs)
Summary
To explore the evolutionary dynamics and biological consequences on genome size, base composition, and relative proportions of functional and nonfunctional sequences are deemed fascinating challenges in biology. Scientific publications in eukaryotes on diversity patterns, evolutionary mechanisms and research methodologies in relation to genome size were recently summarized [11]. The traditional view suggests that more than 90% of human genome are nonfunctional and regarded as “junk DNA”, whereas ENCODE project recently argued that up to 80% of genome sequences have functional roles [2,12]. We analyzed the genome sequences for 32 representative eukaryote species and roughly illustrated their comparisons on genome size, GC content, and relative proportions of intergenic regions, exons and introns (Fig. 1). An intuitional correlation between genome size and fraction of intergenic regions could be drawn out. The proportions of exons and introns show consistent changes more or less
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Computational and Structural Biotechnology Journal
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.