Abstract

A (0,1)-matrix satisfies the consecutive ones property (COP) for the rows if there exists a column permutation such that the ones in each row of the resultant matrix are consecutive. The consecutive ones test is useful for physical mapping and DNA sequence assembly, for example, in the STS content mapping of YAC library, and in the Bactig assembly based on STS as well as EST markers. The linear time algorithm by Booth and Lueker (1976) for this problem has a serious drawback: the data must be error free. However, laboratory work is never flawless. We devised a new iterative clustering algorithm for this problem, which has the following advantages: 1. If the original matrix satisfies the COP, then the algorithm will produce a column ordering realizing it without any fill-in. 2. Under moderate assumptions, the algorithm can accommodate the following four types of errors: false negatives, false positives, nonunique probes, and chimeric clones. Note that in some cases (low quality EST marker identification), NPs occur because of repeat sequences. 3. In case some local data is too noisy, our algorithm could likely discover that and suggest additional lab work to reduce the degree of ambiguity in that part. 4. A unique feature of our algorithm is that, rather than forcing all probes to be included and ordered in the final arrangement, our algorithm would delete some noisy probes. Thus, it could produce more than one contig. The gaps are created mostly by noisy probes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call