Abstract

We derive a procedure to obtain the exact probability that a specific pattern of letters occurs in a longer random sequence of letters. The procedure is generalized to find the exact probability of a fixed (specific) single pattern, and a union or intersection of multiple fixed (specific) patterns within a random sequence perfectly for any distributions of a cell in the random sequence, and can handle patterns with uncertain letters (including missing, blank, unclear, ambiguous, transposition, etc.). The procedure also finds the probability that a pattern that is randomly picked will appear in a separate longer random sequence of letters. These methods are of particular applicability in genetic sequence analysis, diagnostics, anthropology, clinical medicine, data mining, computational molecular biology, and pattern analysis and recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call