Improved performance of sequence search approaches in remote homology detection

Adwait Govind Joshi,Ramanathan Sowdhamini,Upadhyayula Surya Raghavender

doi:10.12688/f1000research.2-93.v2

Adwait Govind Joshi, Ramanathan Sowdhamini + Show 1 more

Open Access

https://doi.org/10.12688/f1000research.2-93.v2

Copy DOI

Abstract

The protein sequence space is vast and diverse, spanning across different families. Biologically meaningful relationships exist between proteins at superfamily level. However, it is highly challenging to establish convincing relationships at the superfamily level by means of simple sequence searches. It is necessary to design a rigorous sequence search strategy to establish remote homology relationships and achieve high coverage. We have used iterative profile-based methods, along with constraints of sequence motifs, to specify search directions. We address the importance of multiple start points (queries) to achieve high coverage at protein superfamily level. We have devised strategies to employ a structural regime to search sequence space with good specificity and sensitivity. We employ two well-known sequence search methods, PSI-BLAST and PHI-BLAST, with multiple queries and multiple patterns to enhance homologue identification at the structural superfamily level. The study suggests that multiple queries improve sensitivity, while a pattern-constrained iterative sequence search becomes stringent at the initial stages, thereby driving the search in a specific direction and also achieves high coverage. This data mining approach has been applied to the entire structural superfamily database.

Highlights

Protein sequence databases have grown enormously in recent times
The sequence search strategy devised for remote homology detection was tested and implemented
In the multiple queries (MQ) approach, all the members from each of the 12 selected superfamilies were used as inputs for PSI-BLAST to search against the non-redundant protein database (NR-Db)

Summary

Introduction

Protein sequence databases have grown enormously in recent times. Understanding protein homology within such huge sets of sequences requires tracing the divergence by mutation, substitution, insertion and deletion of residues[1,2]. Homologous proteins reflect similarity at sequence and structural levels, implying functional similarity[3]. This level of similarity broadens into the superfamily and the ways to deduce such relationships differ for both protein sequence and structure information[4,5]. There are different databases that organize sets of homologous proteins or protein superfamilies based on protein sequence and structure. These databases primarily employ protein domain information present in a sequence or structure. SCOP is a database that organizes the protein structural domain data in different hierarchical levels based on structural and functional information[6]. Structure-based classification is helpful to explore sequence space and helps in functional assignments by association of protein sequences[9]

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Jul 16, 2014
Citations: 2	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Improved performance of sequence search approaches in remote homology detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Improved performance of sequence search approaches in remote homology detection
Ramanathan Sowdhamini ... Mallur Srivatsan Madhusudhan
F1000Research | VOL. 2
Ramanathan Sowdhamini, et. al.Ramanathan Sowdhamini ... Mallur Srivatsan Madhusudhan
11 Jun 2014
F1000Research | VOL. 2

Improved performance of sequence search algorithms in remote homology detection
Adwait Govind Joshi ... Upadhyayula Surya Raghavender
F1000Research | VOL. 2
Adwait Govind Joshi, et. al.Adwait Govind Joshi ... Upadhyayula Surya Raghavender
22 Mar 2013
F1000Research | VOL. 2

Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
Inkyung Jung ... Dongsup Kim
BMC Bioinformatics | VOL. 9
Inkyung Jung, et. al.Inkyung Jung ... Dongsup Kim
01 Jul 2008
BMC Bioinformatics | VOL. 9

Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage
Swati Kaushik ... Vasilis J Promponas
PLoS ONE | VOL. 8
Swati Kaushik, et. al.Swati Kaushik ... Vasilis J Promponas
20 Feb 2013
PLoS ONE | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved performance of sequence search approaches in remote homology detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research