Tabular deep learning: a comparative study applied to multi-task genome-wide prediction

Yuhua Fan,Patrik Waldmann

doi:10.1186/s12859-024-05940-1

Abstract

PurposeMore accurate prediction of phenotype traits can increase the success of genomic selection in both plant and animal breeding studies and provide more reliable disease risk prediction in humans. Traditional approaches typically use regression models based on linear assumptions between the genetic markers and the traits of interest. Non-linear models have been considered as an alternative tool for modeling genomic interactions (i.e. non-additive effects) and other subtle non-linear patterns between markers and phenotype. Deep learning has become a state-of-the-art non-linear prediction method for sound, image and language data. However, genomic data is better represented in a tabular format. The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports successful results on various datasets. Tabular deep learning applications in genome-wide prediction (GWP) are still rare. In this work, we perform an overview of the main families of recent deep learning architectures for tabular data and apply them to multi-trait regression and multi-class classification for GWP on real gene datasets.MethodsThe study involves an extensive overview of recent deep learning architectures for tabular data learning: NODE, TabNet, TabR, TabTransformer, FT-Transformer, AutoInt, GANDALF, SAINT and LassoNet. These architectures are applied to multi-trait GWP. Comprehensive benchmarks of various tabular deep learning methods are conducted to identify best practices and determine their effectiveness compared to traditional methods.ResultsExtensive experimental results on several genomic datasets (three for multi-trait regression and two for multi-class classification) highlight LassoNet as a standout performer, surpassing both other tabular deep learning models and the highly efficient tree based LightGBM method in terms of both best prediction accuracy and computing efficiency.ConclusionThrough series of evaluations on real-world genomic datasets, the study identifies LassoNet as a standout performer, surpassing decision tree methods like LightGBM and other tabular deep learning architectures in terms of both predictive accuracy and computing efficiency. Moreover, the inherent variable selection property of LassoNet provides a systematic way to find important genetic markers that contribute to phenotype expression.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tabular deep learning: a comparative study applied to multi-task genome-wide prediction

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Journal: BMC Bioinformatics	Publication Date: Oct 4, 2024
License type: cc-by

Similar Papers

Evaluation of deep learning for predicting rice traits using structural and single-nucleotide genomic variants
Ioanna-Theoni Vourlaki ... Raúl Castanera
Plant Methods | VOL. 20
Ioanna-Theoni Vourlaki, et. al.Ioanna-Theoni Vourlaki ... Raúl Castanera
10 Aug 2024
Plant Methods | VOL. 20

Genomic Selection: Status in Different Species and Challenges for Breeding
Kf Stock ... R Reents
Reproduction in Domestic Animals | VOL. 48
Kf Stock, et. al.Kf Stock ... R Reents
21 Aug 2013
Reproduction in Domestic Animals | VOL. 48

Machine learning in agriculture: from silos to marketplaces.
Philipp E Bayer ... David Edwards
Plant Biotechnology Journal | VOL. 19
Philipp E Bayer, et. al.Philipp E Bayer ... David Edwards
20 Dec 2020
Plant Biotechnology Journal | VOL. 19

Deep learning architectures for multi-label classification of intelligent health risk prediction
Andrew Maxwell ... Zhaoxian Zhou
BMC Bioinformatics | VOL. 18
Andrew Maxwell, et. al.Andrew Maxwell ... Zhaoxian Zhou
01 Dec 2017
BMC Bioinformatics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tabular deep learning: a comparative study applied to multi-task genome-wide prediction

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics