Visualizing the structure of RNA-seq expression data using grade of membership models.

Kushal K Dey,Chiaowen Joyce Hsiao,Matthew Stephens

doi:10.1371/journal.pgen.1006599

Kushal K Dey, Chiaowen Joyce Hsiao + Show 1 more

Open Access

https://doi.org/10.1371/journal.pgen.1006599

Copy DOI

Journal: PLOS Genetics	Publication Date: Mar 23, 2017
Citations: 161	License type: CC BY 4.0

Affiliation: University of Chicago

Abstract

Grade of membership models, also known as “admixture models”, “topic models” or “Latent Dirichlet Allocation”, are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple “populations”, and in natural language processing to model documents having words from multiple “topics”. Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes—from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust.

Highlights

Ever since large-scale gene expression measurements have been possible, clustering—of both genes and samples—has played a major role in their analysis [1,2,3]
Our goal here is to illustrate that grade of membership (GoM) models—an approach widely used in population genetics to cluster admixed individuals who have ancestry from multiple populations—provide an attractive approach for clustering biological samples of RNA sequencing data
We begin by illustrating the GoM model on bulk RNA expression measurements from the Genotype-Tissue Expression (GTEx) project (V6 dbGaP accession phs000424.v6.p1, release date: Oct 19, 2015, http://www. gtexportal.org/home/)

Summary

Introduction

Ever since large-scale gene expression measurements have been possible, clustering—of both genes and samples—has played a major role in their analysis [1,2,3]. Here we analyse expression data using grade of membership (GoM) models [6], which generalize clustering models to allow each sample to have partial membership in multiple clusters. That is, they allow that each sample has a proportion, or “grade” of membership in each cluster. They allow that each sample has a proportion, or “grade” of membership in each cluster Such models are widely used in population genetics to model admixture, where individuals can have ancestry from multiple populations [7], and in document clustering [8, 9] where each document can have membership in multiple topics. GoM models have recently been applied to detect mutation signatures in cancer samples [10]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visualizing the structure of RNA-seq expression data using grade of membership models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Genetics

Lead the way for us

Similar Papers

Changes in OCT4 expression play a crucial role in the lineage specification and proliferation of preimplantation porcine blastocysts
Mingyun Lee ... Seung‐Hun Kim
Cell Proliferation | VOL. 55
Mingyun Lee, et. al.Mingyun Lee ... Seung‐Hun Kim
26 Jul 2022
Cell Proliferation | VOL. 55

Topic modeling in software engineering research
Camila Costa Silva ... Matthias Galster
Empirical Software Engineering | VOL. 26
Camila Costa Silva, et. al.Camila Costa Silva ... Matthias Galster
06 Sep 2021
Empirical Software Engineering | VOL. 26

Poly (A) . oligo (dT)-stimulated DNA polymerase activity in preimplantation mouse embryos.
Ann A Kiessling ... Harry M Weitlauf
The Journal of experimental zoology | VOL. 215
Ann A Kiessling, et. al.Ann A Kiessling ... Harry M Weitlauf
01 Jan 1981
The Journal of experimental zoology | VOL. 215

A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights
Jiayin Wei ... Yongbin Qin
-
Jiayin Wei, et. al.Jiayin Wei ... Yongbin Qin
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visualizing the structure of RNA-seq expression data using grade of membership models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Genetics