Multiple graph regularized protein domain ranking

Jim Jing-Yan Wang,Xin Gao,Halima Bensmail

doi:10.1186/1471-2105-13-307

Jim Jing-Yan Wang, Xin Gao + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-13-307

Copy DOI

Abstract

BackgroundProtein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.ResultsTo tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.ConclusionThe problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.

Highlights

Protein domain ranking is a fundamental task in structural biology
Multiple graph learning and ranking: MultiG-Rank Here we describe the multiple graph learning method to directly learn a self-adaptive graph for ranking regularization The graph is assumed to be a linear combination of multiple predefined graphs
We evaluated the ranking performance of MultiG-Ranking against other protein domain ranking methods using different protein domain comparison strategies

Summary

Introduction

Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Retrieving and ranking protein domains that are similar to a query protein domain from a protein domain database are critical tasks for the analysis of protein structure, function, and evolution [3,4,5]. Zhang et al used the 32-D tableau feature vector in a comparison procedure called IR tableau [3], while Lee and Lee introduced a measure called WDAC (Weighted Domain Architecture Comparison) that is used in the protein domain comparison context [9]. Both these methods use cosine similarity for comparison purposes

Objectives

Methods

Results

Conclusion