Abstract

We have developed a method for remote homology detection using profile-based fragment matching. Our method compares a sliding nine-residue long fragment of the target sequence to all such fragments in the database and keeps up to 150 candidate fragments with the highest profile-profile score. For each candidate sequence, a dot plot of the positions of the fragments in the target sequence against the positions of the matched fragments in the candidate sequence is drawn. A homology score is derived for each candidate sequence from the pattern of the dot plot. To evaluate our method on protein family classification, we ran it on a well benchmarked dataset derived from SCOP (Structural Classification of Proteins) version 1.53, including 4352 domain sequences. The results show that our method is more accurate than any previously published methods, including Support Vector Machine (SVM) and HMM (Hidden Markov Model) based methods. To evaluate the power of homology detection with a single query sequence, we also tested our method on 803 SCOP family pairs (1606 domain sequences each of which has another sequence within the same family while all others are not within the same superfamily) and 480 superfamily pairs (960 domain sequences each of which has another sequence within the same superfamily but not the same family) derived from SCOP version 1.73. On the family level, our method detected 87.0% of true positives within the top 5 hits and 93.0% within the top 50 hits. With the same setting, PSI-BLAST (version 2.2.17) detected merely 68.6% and 69.2%. On the superfamily level, our method detected 60.0% and 77.5% of true positives within the top 5 and the top 50 hits respectively. PSI-BLAST detected only 23.8% and 24.0%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call