Abstract

Summary: An alternative method to TblastX has been developed. Nucleic acids in database and query sequences were translated into overlapping protein-like sequences (overlappingly translated sequences or OTSs) before searching with BlastP. Thus, each nucleic acid sequences is represented by a single ‘protein like’ sequence instead of three ‘proteins’ in different reading frames. The 3×3 comparison of TblastX is represented by a single comparison, giving faster results. Additional advantages are: (1) it can be more sensitive to detect weak sequence similarities than either blastN or TblastX; (2) codon redundancy is eliminated; (3) the sensitivity to single nucleotide polymorphism, mutation and sequencing errors is reduced; (4) it is insensitive to frame shifts. Results: BlastP using OTS detected about two thirds of blastN and TblastX matches but discovered additional similarities. When blastN and TblastX against nucleic acids were compared to blastP against OTS, identical matches discovered by blastP were generally longer (602, respectively. 213 letters, p<0.01), had higher scores (748 respectively 460 bits, p<0.05) and lower E values (3.16E − 20 vs. 1.17E + 03, p<0.01) but the percentage identity was lower (25% respectively 61%, p<0.001). A qualitative evaluation with LALIGN showed an improvement of the visualization when OTS-s were used instead of nucleic acids. Many extensive sequence similarities became better visible, for example the repeating similarity between prion protein and human insulin gene micro-satellite, and the surprising similarity between the first part of prion protein coding region and the human pro-insulin (34.4% identity and additional 17.2% similarity through 238 residues, score >295 which is expected 4.6e − 18 times by chance).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call