Abstract

BackgroundOver the recent years, a number of genomes have been successfully sequenced and this was followed by genome annotation projects to help understand the biological capabilities of newly sequenced genomes. To improve the annotation of Plasmodium falciparum proteins, we earlier developed parasite specific matrices (PfSSM) and demonstrated their (Smat80 and PfFSmat60) better performance over standard matrices (BLOSUM and PAM). Here we extend that study to nine apicomplexan species other than P. falciparum and develop a web application ApicoAlign for improving the annotation of apicomplexan proteins.ResultsThe SMAT80 and PfFSmat60 matrices perform better for apicomplexan proteins compared to BLOSUM in detecting the orthologs and improving the alignment of these proteins with their potential orthologs respectively. Database searches against non-redundant (nr) database have shown that SMAT80 gives superior performance compared to BLOSUM series in terms of E-values, bit scores, percent identity, alignment length and mismatches for most of the apicomplexan proteins studied here. Using these matrices, we were able to find orthologs for rhomboid proteases of P. berghei, P. falciparum &P. vivax and large subunit of U2 snRNP auxiliary factor of Cryptosporidium parvum in Arabidopsis thaliana. We also show improved pairwise alignments of proteins from Apicomplexa viz. Cryptosporidium parvum and P. falciparum with their orthologs from other species using the PfFSmat60 matrix.ConclusionsThe SMAT80 and PfFSmat60 substitution matrices perform better for apicomplexan proteins compared to BLOSUM series. Since they can be helpful in improving the annotation of apicomplexan genomes and their functional characterization, we have developed a web server ApicoAlign for finding orthologs and aligning apicomplexan proteins.

Highlights

  • Over the recent years, a number of genomes have been successfully sequenced and this was followed by genome annotation projects to help understand the biological capabilities of newly sequenced genomes

  • The SMAT80 and PfFSmat60 substitution matrices perform better for apicomplexan proteins compared to BLOSUM series

  • Since they can be helpful in improving the annotation of apicomplexan genomes and their functional characterization, we have developed a web server ApicoAlign for finding orthologs and aligning apicomplexan proteins

Read more

Summary

Introduction

A number of genomes have been successfully sequenced and this was followed by genome annotation projects to help understand the biological capabilities of newly sequenced genomes. To improve the annotation of Plasmodium falciparum proteins, we earlier developed parasite specific matrices (PfSSM) and demonstrated their (Smat and PfFSmat60) better performance over standard matrices (BLOSUM and PAM). We extend that study to nine apicomplexan species other than P. falciparum and develop a web application ApicoAlign for improving the annotation of apicomplexan proteins. In case of Plasmodium falciparum, approximately ~60% of its genes did not show sequence similarity to known genes [1]. After benchmarking the performance of these matrices for apicomplexan proteins, we develop ApicoAlign a web server for finding orthologs and aligning apicomplexan proteins using a novel series of matrices. For the first four applications, the default values for gap open and extension penalties have been set to 10 and 1 respectively that are defined best for the standard matrices with entropy similar to Smat series. E-value cut-off may be defined by the user

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call