Abstract

Universal arrays contain all possible oligonucleotides of a certain length, typically 6-10 bases. They can determine in a single experiment all substrings of that length that occur along a target sequence. That information, also called the spectrum of the sequence, is not sufficient to uniquely reconstruct a sequence longer than a few hundred bases. We have devised a polynomial algorithm that reconstructs the sequence, given the spectrum and an additional reference sequence, homologous to the target sequence. Such a reference is available, for example, in the identification of single-nucleotide polymorphisms. The algorithm can handle errors in the spectrum as well as substitutions, insertions, and deletions in the target sequence. We present extensive simulation results, which show that the algorithm correctly reconstructs target sequences of >2,000 nucleotides from error-prone 8-mer spectra when realistic levels of single-nucleotide polymorphisms are present.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call