Abstract

BackgroundHomology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search methods such as BLAST. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. However additional improvement could be made by exploiting sources of genomic information other than the primary sequence or tertiary structure.ResultsWe test the hypothesis that extrinsic gene properties gene length and gene order can be of help in differentiating spurious sequence similarity from homology in the gray zone. Sharing gene order and similarity in size dramatically increase the chance of a query-hit pair being homologous: gray zone query-hit pairs of similar size and with conserved gene order are homologous in 99% of all cases, while for query-hit pairs without gene order conservation and with different sizes this is only 55%.ConclusionWe have shown that using gene length and gene order drastically improves the detection of homologs within the BLAST gray zone. Our findings suggest that the use of such extrinsic gene properties can also improve the performance of homology detection by more advanced methods, and our study thereby underscores the importance of true data integration for fully exploiting genomic information.

Highlights

  • Homology is a key concept in both evolutionary biology and genomics

  • Orthology is a specific case of homology, in which genes in different species evolved from a common ancestral gene through speciation [1]

  • For values which are normally considered grey-zone, we observe as expected that a substantial portion of hits are not homologous according to PFAM: 65% of the BLAST hits with an e-value above 1e-03 but below 10 are homologous according to PFAM clans, and only 43% of the hits are homologous at e-values between 1 and 10

Read more

Summary

Introduction

Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. Homologs are genes or genomics regions sharing a common origin, related through speciation, duplication or a combination of both. Orthology is a specific case of homology, in which genes in different species evolved from a common ancestral gene through speciation [1]. The identification of homologous proteins is an important step in predicting the function of proteins that have not been studied experimentally and is crucial in comparative genomics studies. Identification of taxon-specific genes and estimation of the rate of gene genesis all rely on the detection of orthology (and homology). Rapid sequence divergence can obscure the real evolutionary relationship between genes [2], a scenario that could (page number not for citation purposes)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.