Abstract

In this paper we used statistical methods to understand the genetic information of DNA considered as a statistical system. The alphabet of a DNA sequence is defined by the four nucleotides: adenine, cytosine, guanine, and thymine. The order of nucleotides along the DNA sequences encodes the genetic information. We have analyzed three Cryptosporidium DNA sequences: one DNA sequence isolated and analyzed in our laboratory and two DNA reference sequences from the public database GenBank. Each DNA sequence is considered as a statistical system and is represented by a random variable and an associate probability distribution. The Shannon entropy, Renyi entropy, Onicescu informational energy and square deviation from uniform distribution are used in order to measure the degree of randomness for the three statistical systems. The similarity and difference between the three DNA sequences of the two Cryptosporidium species (Cryptosporidium hominis and Cryptosporidium parvum) were assessed by calculating the statistical distance between the probability distributions associated with each pair of DNA sequences. Each of the three DNA sequences pairs with one of the other two sequences and forms three pairs of sequences. Using the associated probability distributions, the statistical distance between them can be calculated. Bhattacharyya distance measures similarity degree between the two probability distributions. The Kullback-Leiber and the resistor-average distances measure the difference between the two distributions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call