Abstract
In omics studies, different sources of information about the same set of genes are often available. When the group structure (e.g., gene pathways) within the genes are of interests, we combine the normal hierarchical model with the stochastic block model, through an integrative clustering framework, to model gene expression and gene networks jointly. The integrative framework provides higher accuracy in extensive simulation studies when one or both of the data sources contain noises or when different data sources provide complementary information. An empirical guideline in the choice between integrative versus separate clustering models is proposed. The integrative clustering method is illustrated on the mouse embryo single cell RNAseq and bulk cell microarray data, which identified not only the gene sets shared by both data sources but also the gene sets unique in one data source.
Highlights
Network analysis is the study of networks representing relationships between objects
The integrative clustering method is illustrated on the mouse embryo single cell RNAseq and bulk cell microarray data, which identified the gene sets shared by both data sources and the gene sets unique in one data source
We examine the performance of integrative clustering versus separate clustering in the presence of contamination as well as orthogonal community structures in different data sources
Summary
Network analysis is the study of networks representing relationships (i.e., links or edges) between objects (i.e., vertices or nodes). We dichotomize connectivity measures in single cell RNAseq data and apply stochastic block model (SBM) on the resulting gene network data. SBM is a popular community detection approach on binary network data, which assumes that the link probability of each pair of nodes. The proposed method falls into the category of probability clustering models, since it combines the log likelihoods of the NHM and the SBM on two sets of data from independent data sources. The NHM describes the clustering structure on the mean values of gene expression levels, while the SBM extracts groups using mutual distances between genes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have