Latent Dirichlet Allocation for Classification using Gene Expression Data

Hima Bindu Yalamanchili,Michael L Raymer,Soon Jye Kho

doi:10.1109/bibe.2017.00-81

Abstract

Understanding the role of differential gene expression in the development of, and molecular response to, cancer is a complex problem that remains challenging, in part due to the sheer number of genes, gene products, and metabolites involved. In this paper, we employ an unsupervised topic model, Latent Dirichlet Allocation (LDA) to explore patterns of gene expression in healthy and cancer tissues. An important advantage of LDA compared to alternative statistical and machine learning methods is its proven ability to handle sparse inputs over an extremely large numbers of features in an unsupervised manner. LDA has been recently applied for clustering and exploring genomic data but not for classification and prediction. In this paper, we try to optimize the protocol and parameters for efficient implementation of LDA. Here, messenger RNA (mRNA) sequence data from breast cancer and healthy tissue is used to determine an effective approach for the application of LDA to classification of cancer versus healthy tissue. We describe our study in two phases: First, various parameters like the number of topics, bins and passes were optimized for LDA. Next we developed a novel LDA-based classification approach to classify unknown samples based on similarity of co-expression patterns. Evaluation to assess the effectiveness of this approach shows that LDA can achieve high accuracy compared to alternative approaches. Overall, our results project LDA as a promising approach for classification of tissue types based on gene expression data in cancer studies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Latent Dirichlet Allocation for Classification using Gene Expression Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Novel Approach for Classifying Gene Expression Data using Topic Modeling
Soon Jye Kho ... Michael L Raymer
-
Soon Jye Kho, et. al.Soon Jye Kho ... Michael L Raymer
20 Aug 2017
20 Aug 2017

Discriminating normal regions within cancerous hen ovarian tissue using multivariate hyperspectral image analysis.
Mahsa Akbari Lakeh ... David C Muddiman
Rapid communications in mass spectrometry : RCM | VOL. 33
Mahsa Akbari Lakeh, et. al.Mahsa Akbari Lakeh ... David C Muddiman
29 Jan 2019
Rapid communications in mass spectrometry : RCM | VOL. 33

Three-dimensional quantification of capillary networks in healthy and cancerous tissues of two mice
Sabrina Lang ... Simone E Hieber
Microvascular Research | VOL. 84
Sabrina Lang, et. al.Sabrina Lang ... Simone E Hieber
13 Jul 2012
Microvascular Research | VOL. 84

Quantification of differences in the effective atomic numbers of healthy and cancerous tissues: A discussion in the context of diagnostics and dosimetry
M L Taylor
Medical Physics | VOL. 39
M L TaylorM L Taylor
16 Aug 2012
Medical Physics | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Latent Dirichlet Allocation for Classification using Gene Expression Data

Abstract

Talk to us

Similar Papers