Abstract

We analyzed DNA sequences using a new measure of entropy. The general aim was to analyze DNA sequences and find interesting sections of a genome using a new formulation of Shannon like entropy. We developed this new measure of entropy for any non-trivial graph or, more broadly, for any square matrix whose non-zero elements represent probabilistic weights assigned to connections or transitions between pairs of vertices. The new measure is called the graph entropy and it quantifies the aggregate indeterminacy effected by the variety of unique walks that exist between each pair of vertices. The new tool is shown to be uniquely capable of revealing CRISPR regions in bacterial genomes and to identify Tandem repeats and Direct repeats of genome. We have done experiment on 26 species and found many tandem repeats and direct repeats (CRISPR for bacteria or archaea). There are several existing separate CRISPR or Tandem finder tools but our entropy can find both of these features if present in genome.

Highlights

  • Deciphering the enormously long nucleotide sequences that are being uncovered in the human genome is one of the major challenges in our days

  • The new measure is called the graph entropy and it quantifies the aggregate indeterminacy effected by the variety of unique walks that exist between each pair of vertices

  • The new tool is shown to be uniquely capable of revealing Clustered Regularly Inter Spaced Palindromic Repeats (CRISPRs) regions in bacterial genomes and to identify Tandem repeats and Direct repeats of genome

Read more

Summary

Introduction

Deciphering the enormously long nucleotide sequences that are being uncovered in the human genome is one of the major challenges in our days. Along with serious ethical issues, we encounter a series of tremendously hard scientific problems These problems mainly arise from the fact that sequencing techniques are almost completely automatic controlled the analysis of the sequenced data is not. D. Sengupta 42 ciple, biochemical methods are able to do this job, but since they are extremely expensive and time consuming, there is a high demand for alternative approaches to extract the information hidden in genome [1]. Sengupta 42 ciple, biochemical methods are able to do this job, but since they are extremely expensive and time consuming, there is a high demand for alternative approaches to extract the information hidden in genome [1] In this situation, concepts and techniques from information theory turned out to be welcoming tools to handle the problem of extracting valuable information from biosequences such as DNA, RNA, or amino acid chains.

Objectives
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.