Abstract

A format 1 technology for performing massive hybridization experiments has been developed as part of the sequencing by hybridization (SBH) project. Arrays of tens of thousands of clones are interrogated with short oligomer probes to determine sets of oligomers that are present in individual clones. SBH requires highly discriminative hybridizations with a large number of probes. One of the main uses of a reconstructed DNA sequence is in a similarity search against databases of known DNA. We argue that sequence reconstruction, even partial, should not be performed for this particular purpose; we provide an information-theoretic proof that the oligomer lists obtained from hybridization experiments should be used directly for similarity searches. We propose a similarity search method that takes full advantage of the subword structure of positively identified oligomers within a clone. The method tolerates error in hybridization experiments, requires fewer probes than necessary for sequencing, and is computationally efficient. To enable direct sequence recognition, we apply the recently developed method of sequence comparison that is based on minimal length encoding and algorithimic mutual information. The method has been tested on both real and simulated data and has led to a correct identification of clones based on hybridizations with 109 short oligomer probes. The method is applicable to hybridization data that comes from both format 1 and format 2 (sequencing chip) hybridization experiments. The sequence recognition method can provide targeting information for large-scale DNA sequencing by gel-based methods or by hybridization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.