DENSE: efficient and prior knowledge-driven discovery of phenotype-associated protein functional modules

Willam Hendrix,Alok Choudhary,Andrea M Rocha,Kathleen Scott,James R Mihelcic,Nagiza F Samatova,Kanchana Padmanabhan

doi:10.1186/1752-0509-5-172

Abstract

BackgroundIdentifying cellular subsystems that are involved in the expression of a target phenotype has been a very active research area for the past several years. In this paper, cellular subsystem refers to a group of genes (or proteins) that interact and carry out a common function in the cell. Most studies identify genes associated with a phenotype on the basis of some statistical bias, others have extended these statistical methods to analyze functional modules and biological pathways for phenotype-relatedness. However, a biologist might often have a specific question in mind while performing such analysis and most of the resulting subsystems obtained by the existing methods might be largely irrelevant to the question in hand. Arguably, it would be valuable to incorporate biologist's knowledge about the phenotype into the algorithm. This way, it is anticipated that the resulting subsytems would not only be related to the target phenotype but also contain information that the biologist is likely to be interested in.ResultsIn this paper we introduce a fast and theoretically guranteed method called DENSE (Dense and ENriched Subgraph Enumeration) that can take in as input a biologist's prior knowledge as a set of query proteins and identify all the dense functional modules in a biological network that contain some part of the query vertices. The density (in terms of the number of network egdes) and the enrichment (the number of query proteins in the resulting functional module) can be manipulated via two parameters γ and μ, respectively.ConclusionThis algorithm has been applied to the protein functional association network of Clostridium acetobutylicum ATCC 824, a hydrogen producing, acid-tolerant organism. The algorithm was able to verify relationships known to exist in literature and also some previously unknown relationships including those with regulatory and signaling functions. Additionally, we were also able to hypothesize that some uncharacterized proteins are likely associated with the target phenotype. The DENSE code can be downloaded from http://www.freescience.org/cs/DENSE/

Highlights

Identifying cellular subsystems that are involved in the expression of a target phenotype has been a very active research area for the past several years
A biologist might wish to search an organismal protein functional association network for those modules associated with motility using some of the known flagella proteins as “prior knowledge” or a biologists may use the enzymes in the TCA cycle pathway to identify subsystems related to aerobic respiration
We describe a theoretically sound and fast method called the Dense ENriched Subgraph Enumeration (DENSE) algorithm that capitalizes on the availability of any “prior knowledge” about the proteins involved in a particular process and identifies overlapping sets of functionally associated proteins from an organismal network that are enriched with the given knowledge

Summary

Introduction

Identifying cellular subsystems that are involved in the expression of a target phenotype has been a very active research area for the past several years. Application of genomic and systems-biology studies towards environmental engineering (e.g., waste treatment) generally requires understanding of microbial response and metabolic capabilities at the genome and metabolic levels This includes understanding of relationships between phenotypes and the various cellular potential candidates for modification studies and to determine how modification of selected genes could impact the desired outcome (e.g., hydrogen production). A biologist might wish to search an organismal protein functional association network for those modules associated with motility using some of the known flagella proteins as “prior knowledge” or a biologists may use the enzymes in the TCA cycle pathway to identify subsystems related to aerobic respiration. When applied to a network of functionally associated proteins in the dark fermentative, hydrogen producing and acid-tolerant bacterium, Clostridium acetobutylicum, the algorithm is able to predict known and novel relationships, including those that contain regulatory, signaling, and uncharacterized proteins

Methods

Results

Conclusion