Abstract

Chaos game representation (CGR) of DNA sequences and linked protein sequences from genomes was proposed by Jeffrey (1990) and Yu et al. (2004), respectively. In this paper, we consider the CGR of three kinds of sequences from complete genomes: whole genome DNA sequences, linked coding DNA sequences and linked protein sequences. Some fractal patterns are found in these CGRs. A recurrent iterated function systems (RIFS) model is proposed to simulate the CGRs of these sequences from genomes and their induced measures. Numerical results on 50 genomes show that the RIFS model can simulate very well the CGRs and their induced measures. The parameters estimated in the RIFS model reflect information on species classification.

Highlights

  • The hereditary information of organisms is encoded in their DNA sequences which are one-dimensional unbranched polymers made up from four different kinds of monomers: adenine (a), cytosine (c), guanine (g), and thymine (t)

  • If the DNA sequences were a random collection of bases, the chaos game representation (CGR) would be a uniformly filled square, any patterns visible in the CGR represent some pattern in the DNA sequence (Goldman 1993)

  • Based on the detailed HP model, we proposed a CGR for the linked protein sequences from the genomes (Yu et al 2004b)

Read more

Summary

INTRODUCTION

The hereditary information of organisms (except for RNA-viruses) is encoded in their DNA sequences which are one-dimensional unbranched polymers made up from four different kinds of monomers (nucleotides): adenine (a), cytosine (c), guanine (g), and thymine (t). All points plotted within a quadrant must corresponding to subsequences of the DNA sequence that end with the base labelling the corner of that quadrant He proposed a discrete time Markov Chain model to simulate the CGR of DNA sequences and use the sequence's dinucleotide and trinucleotide frequencies to calculate the probabilities in these models. A well-known model of protein sequence analysis is the HP model proposed by Dill et al (1985) In this model 20 kinds of amino acids are divided into two types, hydrophobic (H) (or non-polar) and polar (P) (or hydrophilic). 20 different kinds of amino acids can be divided into four classes: non-polar, negative polar, uncharged polar and positive polar. In this model, one considers more details than in the HP model. The probability matrix in our RIFS model is similar to the one in Markov model used by Goldman (1993), but the way to estimate this matrix is different

CHAOS GAME REPRESENTATION OF GENOMES
RECURRENT ITERATED FUNCTION SYSTEM FOR A MEASURE
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call