GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.

Narumol Doungpan,Asawin Meechai,Worrawat Engchuan,Jonathan H Chan

doi:10.1186/s12920-016-0231-4

Abstract

BackgroundGene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results.MethodsThe two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms.ResultsThe two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method.ConclusionsThe proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.

Highlights

Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges
The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the Gene-Network-based Feature Set (GNFS) method
The performance of the subnetworks identified from the Gene Sub-Network-based Feature Selection (GSNFS) method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis

Summary

Introduction

Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. The identification of the gene biomarker from high throughput gene expression data for complex diseases is a challenging task. Since different genetic mutations and dysfunctions of different biological processes are present in complex diseases like cancer, the gene biomarker identification is a nontrivial task. The altered genes are functionally linked in a common biological pathway, enabling tumour cells to activate a specific set of cellular processes known as the hallmarks of cancer [2, 3]. The alteration on these processes result in the changing of cellular homeostasis and cancer development

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Genomics	Publication Date: Dec 1, 2016
Citations: 11	License type: cc-by

R Discovery Prime

R Discovery Prime

GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics

Lead the way for us

Similar Papers

Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
Stanislav Listopad ... Trina M Norden-Krichmar
JHEP Reports | VOL. 4
Stanislav Listopad, et. al.Stanislav Listopad ... Trina M Norden-Krichmar
18 Aug 2022
JHEP Reports | VOL. 4

Detection and Independent Validation of Model-Based Quantitative Transcriptional Regulation Relationships Altered in Lung Cancers.
Meiyu Duan ... Jiaxin Zheng
Frontiers in bioengineering and biotechnology | VOL. 8
Meiyu Duan, et. al.Meiyu Duan ... Jiaxin Zheng
10 Jun 2020
Frontiers in bioengineering and biotechnology | VOL. 8

Computational identification of biomarker genes for lung cancer considering treatment and non-treatment studies
Mona Maharjan ... Wenrui Duan
BMC Bioinformatics | VOL. 21
Mona Maharjan, et. al.Mona Maharjan ... Wenrui Duan
01 Dec 2020
BMC Bioinformatics | VOL. 21

Explainable Machine Learning to Identify Patient-specific Biomarkers for Lung Cancer
Masrur Sobhan ... Ananda Mohan Mondal
-
Masrur Sobhan, et. al.Masrur Sobhan ... Ananda Mohan Mondal
06 Dec 2022
06 Dec 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics