Two-way learning with one-way supervision for gene expression data

Monica H T Wong,David M Mutch,Paul D Mcnicholas

doi:10.1186/s12859-017-1564-5

Monica H T Wong, David M Mutch + Show 1 more

Open Access

PDF Available

https://doi.org/10.1186/s12859-017-1564-5

Copy DOI

Export

Save

Cite

Journal: BMC Bioinformatics	Publication Date: Mar 4, 2017
Citations: 3	License type: open-access

Affiliation: McMaster University, University of Guelph

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundA family of parsimonious Gaussian mixture models for the biclustering of gene expression data is introduced. Biclustering is accommodated by adopting a mixture of factor analyzers model with a binary, row-stochastic factor loadings matrix. This particular form of factor loadings matrix results in a block-diagonal covariance matrix, which is a useful property in gene expression analyses, specifically in biomarker discovery scenarios where blood can potentially act as a surrogate tissue for other less accessible tissues. Prior knowledge of the factor loadings matrix is useful in this application and is reflected in the one-way supervised nature of the algorithm. Additionally, the factor loadings matrix can be assumed to be constant across all components because of the relationship desired between the various types of tissue samples. Parameter estimates are obtained through a variant of the expectation-maximization algorithm and the best-fitting model is selected using the Bayesian information criterion. The family of models is demonstrated using simulated data and two real microarray data sets. The first real data set is from a rat study that investigated the influence of diabetes on gene expression in different tissues. The second real data set is from a human transcriptomics study that focused on blood and immune tissues. The microarray data sets illustrate the biclustering family’s performance in biomarker discovery involving peripheral blood as surrogate biopsy material.ResultsThe simulation studies indicate that the algorithm identifies the correct biclusters, most optimally when the number of observation clusters is known. Moreover, the biclustering algorithm identified biclusters comprised of biologically meaningful data related to insulin resistance and immune function in the rat and human real data sets, respectively.ConclusionsInitial results using real data show that this biclustering technique provides a novel approach for biomarker discovery by enabling blood to be used as a surrogate for hard-to-obtain tissues.

Highlights

A family of parsimonious Gaussian mixture models for the biclustering of gene expression data is introduced
Constraints can be imposed or not on g, g, and g = ψgIp to create a family of eight oneway-supervised Gaussian mixture models for biclustering (Table 1), which will be referred to as One-way supervised Gaussian biclustering (OSGaBi) hereafter
Model selection was done via the Bayesian information criterion (BIC) as previously described, it can be noted that the integrated completed likelihood (ICL) [33] and Akaike information criterion (AIC) [34] were used as comparison and produced the same outcomes

Summary

Introduction

A family of parsimonious Gaussian mixture models for the biclustering of gene expression data is introduced. Biclustering is accommodated by adopting a mixture of factor analyzers model with a binary, rowstochastic factor loadings matrix This particular form of factor loadings matrix results in a block-diagonal covariance matrix, which is a useful property in gene expression analyses, in biomarker discovery scenarios where blood can potentially act as a surrogate tissue for other less accessible tissues. The microarray data sets illustrate the biclustering family’s performance in biomarker discovery involving peripheral blood as surrogate biopsy material. Returning to the idea of peripheral blood as surrogate material, a gene that exhibits correlated expression profiles in blood and a given tissue (but not other tissues) may be a biomarker of interest In this scenario, the genes act as the observations and the blood and tissues (the samples) act as the variables. A data point in the microarray data set is an intensity value

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Two-way learning with one-way supervision for gene expression data

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Decision letter: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg ... Detlef Weigel
-
Magnus Nordborg, et. al.Magnus Nordborg ... Detlef Weigel
04 Jul 2022
04 Jul 2022

Editor's evaluation: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg
-
Magnus NordborgMagnus Nordborg
04 Jul 2022
04 Jul 2022

RN+: A Novel Biclustering Algorithm for Analysis of Gene Expression Data Using Protein-Protein Interaction Network.
Jaegyoon Ahn ... Junhyeok Choi
Journal of Computational Biology | VOL. 26
Jaegyoon Ahn, et. al.Jaegyoon Ahn ... Junhyeok Choi
25 Feb 2019
Journal of Computational Biology | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Two-way learning with one-way supervision for gene expression data

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics