Abstract

The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.

Highlights

  • The use of iterative functions for scale independent representation of biological sequences was first proposed well over a decade ago [1]

  • We have subsequently examined that equivalence and have shown that, quite the contrary, it is the Markovian transition that is a special solution of the Chaos Game Representation (CGR) procedure [3]

  • The equivalence between iterative maps and genomic signatures has been noted its simpler, and faster implementation [4,5,6,7], and it has even lead to a number of webbased and stand alone applications, including a function, CHAOS, available in the popular bioinformatics library EMBOSS [8]

Read more

Summary

Introduction

The use of iterative functions for scale independent representation of biological sequences was first proposed well over a decade ago [1]. That original proposition, designated as Chaos Game Representation (CGR), was soon found objectionable on the grounds of equivalence to standard Markov transition tables [2]. The reader is referred to that report for a brief revision of earlier work on iterative functions for representation of sequence succession. The equivalence between iterative maps and genomic signatures (more exactly that the latter comes as a special solution of the former) has been noted its simpler, and faster implementation [4,5,6,7], and it has even lead to a number of webbased and stand alone applications, including a function, CHAOS, available in the popular bioinformatics library EMBOSS [8].

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.