CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.

Lillian R Thistlethwaite,Aleksandar Milosavljevic,Sarah H Elsea,Varduhi Petrosyan,Marcus J Miller,Xiqi Li,Jason A. Papin

doi:10.1371/journal.pcbi.1008550

Abstract

We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, “Connect the Dots”, a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.

Highlights

Weighted graphs are often used to model variability in biological systems detected from molecular profiling
S may be the set of molecular variables that are perturbed in an individual case or in a set of disease cases relative to controls
CTD algorithm to "connect the dots" in weighted graphs protocols.io Protocol (DOI: www.dx.doi.org/10. 17504/protocols.io.bpdvmi66) we have developed to serve as an accompanying resource for this work

Summary

Introduction

Weighted graphs are often used to model variability in biological systems detected from molecular profiling. Such graphs may serve as a context for interpreting perturbations observed in independent cases. The scoring functions employed by many current algorithms typically require permutation testing to establish statistically rigorous p-values. To address both problems, we developed CTD, a novel information-theoretic algorithm that figuratively “connects the dots” by detecting subsets of S that are significantly connected within G, and assigns an upper bound on their p-values without the computational cost associated with permutation testing

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS computational biology	Publication Date: Jan 29, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology

Lead the way for us

Similar Papers

An efficient local search algorithm for solving maximum edge weight clique problem in large graphs
Yi Chu ... Chuan Luo
Journal of Combinatorial Optimization | VOL. 39
Yi Chu, et. al.Yi Chu ... Chuan Luo
04 Feb 2020
Journal of Combinatorial Optimization | VOL. 39

Approximating Clique and Biclique Problems
Dorit S Hochbaum
Journal of Algorithms | VOL. 29
Dorit S HochbaumDorit S Hochbaum
01 Oct 1998
Journal of Algorithms | VOL. 29

A Bi-level Blocked Estimation of Distribution Algorithm with local search for Maximum Clique Problems
Yan Zhang ... Zhu Jiang
-
Yan Zhang, et. al. Yan Zhang ... Zhu Jiang
01 Jun 2008
01 Jun 2008

Approximation for Problems in Multi-User Information Theory

-

01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology