Abstract

We describe a fast computer algorithm for identifying consensus patterns in DNA sequences. The method requires no prior assumptions about the consensus pattern other than its length. In particular no previous knowledge of the frequency or spacing of consensus patterns is required. However, a priori information about the shape of the consensus pattern, or invariability of individual positions, or the overall conservation level, can be utilized to enhance the selectivity and sensitivity of search. As the number of all possible consensus words increases very rapidly with length, comprehensive searches have usually been restricted to a maximum of 10-12 nucleotides, even when large mainframes are used. Our algorithm enables searching for consensus patterns of this order on current mid-range and powerful microcomputers. Searches may be conducted on single, long sequences or a set of possibly aligned shorter sequences. We give examples of identified consensus patterns in both prokaryotic and eukaryotic DNA sequences, along with some typical program timings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.