A comparative analysis of gene expression profiling by statistical and machine learning approaches.

Myriam Bontonou,Anaïs Haget,Maria Boulougouri,Benjamin Audit,Pierre Borgnat,Jean-Michel Arbona

doi:10.1093/bioadv/vbae199

Myriam Bontonou, Anaïs Haget + Show 4 more

Open Access

https://doi.org/10.1093/bioadv/vbae199

Copy DOI

Export

Save

Cite

Journal: Bioinformatics advances	Publication Date: Jan 1, 2025
License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example. Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain. Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

A comparative analysis of gene expression profiling by statistical and machine learning approaches.

Abstract

Published Version

Talk to us

Similar Papers

More From: Bioinformatics advances

Lead the way for us

Similar Papers

Quantifying uncertainty in unsupervised machine learning methods for seismic facies using outcrop-derived 3D models and synthetic seismic data
Karelia La Marca ... Lisa Stright
-
Karelia La Marca, et. al.Karelia La Marca ... Lisa Stright
15 Aug 2022
15 Aug 2022

Exploring the methods on early detection of Alzheimer's disease
B A Sujatha Kumari ... Charitha Shetty M
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 9
B A Sujatha Kumari, et. al.B A Sujatha Kumari ... Charitha Shetty M
30 May 2020
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 9

LCIP: a retargetable framework for optimized CNN inference
Lei Pan ... Yan Zhang
-
Lei Pan, et. al.Lei Pan ... Yan Zhang
13 Jun 2023
13 Jun 2023

A Comprehensive Survey on Federated Learning Techniques for Healthcare Informatics.
K Dasaradharami Reddy ... Thippa Reddy Gadekallu
Computational intelligence and neuroscience | VOL. 2023
K Dasaradharami Reddy, et. al.K Dasaradharami Reddy ... Thippa Reddy Gadekallu
01 Jan 2023
Computational intelligence and neuroscience | VOL. 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A comparative analysis of gene expression profiling by statistical and machine learning approaches.

Abstract

Published Version

Talk to us

Similar Papers

More From: Bioinformatics advances