Abstract

BackgroundG-quadruplexes are four-stranded structures formed in guanine-rich nucleotide sequences. Several functional roles of DNA G-quadruplexes have so far been investigated, where their putative functional roles during DNA replication and transcription have been suggested. A necessary condition for G-quadruplex formation is the presence of four regions of tandem guanines called G-runs and three nucleotide subsequences called loops that connect G-runs. A simple computational way to detect potential G-quadruplex regions in a given genomic sequence is pattern matching with regular expression. Although many putative G-quadruplex motifs can be found in most genomes by the regular expression-based approach, the majority of these sequences are unlikely to form G-quadruplexes because they are unstable as compared with canonical double helix structures.ResultsHere we present elaborate computational models for representing DNA G-quadruplex motifs using hidden Markov models (HMMs). Use of HMMs enables us to evaluate G-quadruplex motifs quantitatively by a probabilistic measure. In addition, the parameters of HMMs can be trained by using experimentally verified data. Computational experiments in discriminating between positive and negative G-quadruplex sequences as well as reducing putative G-quadruplexes in the human genome were carried out, indicating that HMM-based models can discern bona fide G-quadruplex structures well and one of them has the possibility of reducing false positive G-quadruplexes predicted by existing regular expression-based methods. Furthermore, our results show that one of our models can be specialized to detect G-quadruplex sequences whose functional roles are expected to be involved in DNA transcription.ConclusionsThe HMM-based method along with the conventional pattern matching approach can contribute to reducing costly and laborious wet-lab experiments to perform functional analysis on a given set of potential G-quadruplexes of interest. The C++ and Perl programs are available at http://tcs.cira.kyoto-u.ac.jp/~ykato/program/g4hmm/.

Highlights

  • G-quadruplexes are four-stranded structures formed in guanine-rich nucleotide sequences

  • In the third test with statistical analysis in discriminating highly likely G4 structures from putative G4 motifs in the human pre-mRNA sequences [26], the results show that the hidden Markov models (HMMs)-based model that can represent elaborate length distribution of G-run regions outperforms the other three models presented in this work

  • Our results show that HMM-based models are statistically reliable enough to detect a more specified motif among general G4 structures in genomic sequences, narrowing down potential G4 sequences predicted by the existing pattern matching method

Read more

Summary

Introduction

G-quadruplexes are four-stranded structures formed in guanine-rich nucleotide sequences. A necessary condition for G-quadruplex formation is the presence of four regions of tandem guanines called G-runs and three nucleotide subsequences called loops that connect G-runs. Many putative G-quadruplex motifs can be found in most genomes by the regular expression-based approach, the majority of these sequences are unlikely to form Gquadruplexes because they are unstable as compared with canonical double helix structures. A G4 structure is one of the topological conformations that DNAs can adopt, where G-quartets, hydrogen-bonded square planar substructures between four guanines (Gs), are stacked onto each other (see Figure 1). A G4 sequence can be represented by four regions of consecutive Gs that form G-quartets, called Gruns, and three regions of nucleotide subsequences that connect G-runs, called loops, which can have varying length including lack of loop [3]. G4 structures are stabilized by monovalent cations, especially K+, located in the central cavities in the stack

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.