Abstract

Dynamic correlations are pervasive in high-throughput data. Large numbers of gene pairs can change their correlation patterns in response to observed/unobserved changes in physiological states. Finding changes in correlation patterns can reveal important regulatory mechanisms. Currently there is no method that can effectively detect global dynamic correlation patterns in a dataset. Given the challenging nature of the problem, the currently available methods use genes as surrogate measurements of physiological states, which cannot faithfully represent true underlying biological signals. In this study we develop a new method that directly identifies strong latent dynamic correlation signals from the data matrix, named DCA: Dynamic Correlation Analysis. At the center of the method is a new metric for the identification of pairs of variables that are highly likely to be dynamically correlated, without knowing the underlying physiological states that govern the dynamic correlation. We validate the performance of the method with extensive simulations. We applied the method to three real datasets: a single cell RNA-seq dataset, a bulk RNA-seq dataset, and a microarray gene expression dataset. In all three datasets, the method reveals novel latent factors with clear biological meaning, bringing new insights into the data.

Highlights

  • The biological system involves tens of thousands of genes/proteins that are tightly regulated in a complex network [1,2,3]

  • It achieves the goal of efficiently finding patterns of dynamic correlation in RNA-seq data, as well as detecting biological functions associated with the dynamic correlation patterns

  • The purpose of the Liquid Association Coefficient (LAC) was to help identify gene pairs that were most likely to have the relationship of dynamic correlation, without knowing the underlying physiological states that govern the dynamic correlation

Read more

Summary

Introduction

The biological system involves tens of thousands of genes/proteins that are tightly regulated in a complex network [1,2,3]. Interactions and regulations in the network are highly dynamic. They change substantially in different cell types, developmental stages, or in response to environmental conditions [4]. Gene expression and similar types of data, such as proteomics and metabolomics data, represent outcomes of the dynamic regulatory network. Changes in the underlying regulation patterns can often result in changes in correlation between genes. In many gene expression profiling datasets, the cellular states or sub-classes are not observed directly. Once successfully extracted from the data, the dynamic correlation patterns can in-turn help deduce hidden cellular states and sub-classes. Because gene expression is tightly controlled in the cell, the same Z variable can govern the dynamic correlation of many gene pairs

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call