Abstract
Observed patterns in macromolecular sequences are often considered as words and compared with their probabilities of occurring in random sequences. Calculation of these probabilities, however, often lacks rigour. We have developed an algorithm for exact computation of such probabilities for stochastic sequences that follow a Markov chain model. The method is applicable to the case that a random sequence contains one out of two given patterns P and Q, or both simultaneously. Another application yields the probability function P(x) that a sequence contains pattern P exactly x times. An application to patterns that include wild-card characters yields probabilities for homonucleotide clusters of a given length. We prove the probability of multiple runs of single nucleotides in the SV40 genome to be in accordance with the dinucleotide composition of the sequence, although it is in conflict with mononucleotide composition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.