Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large-scale soybean dataset.

Hawlader A Al-Mamun,Monica F Danilevicz,Jacob I Marsh,Cedric Gondro,David Edwards

doi:10.1002/tpg2.20503

Hawlader A Al-Mamun, Monica F Danilevicz + Show 3 more

Open Access

https://doi.org/10.1002/tpg2.20503

Copy DOI

Export

Save

Cite

Journal: The plant genome	Publication Date: Sep 10, 2024
License type: CC BY-NC-ND 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

The surge in high-throughput technologies has empowered the acquisition of vast genomic datasets, prompting the search for genetic markers and biomarkers relevant to complex traits. However, grappling with the inherent complexities of high dimensionality and sparsity within these datasets poses formidable hurdles. The immense number of features and their potential redundancy demand efficient strategies for extracting pertinent information and identifying significant markers. Feature selection is important in large genomic data as it helps in enhancing interpretability and computational efficiency. This study focuses on addressing these challenges through a comprehensive investigation into genomic feature selection methodologies, employing a rich soybean (Glycine max L. Merr.) dataset comprising 966 lines with over 5.5 million single nucleotide polymorphisms. Emphasizing the "small n large p" dilemma prevalent in contemporary genomic studies, we compared the efficacy of traditional genome-wide association studies (GWAS) with two prominent machine learning tools, random forest and extreme gradient boosting, in pinpointing predictive features. Utilizing the expansive soybean dataset, we assessed the performance of these methodologies in selecting features that optimize predictive modeling for various phenotypes. By constructing predictive models based on the selected features, we ascertain the comparative prediction accuracies, thereby illuminating the strengths and limitations of these feature selection methodologies in the realm of genomic data analysis.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large-scale soybean dataset.

Abstract

Published Version

Talk to us

Similar Papers

More From: The plant genome

Lead the way for us

Similar Papers

Genomic Selection: Status in Different Species and Challenges for Breeding
Kf Stock ... R Reents
Reproduction in Domestic Animals | VOL. 48
Kf Stock, et. al.Kf Stock ... R Reents
21 Aug 2013
Reproduction in Domestic Animals | VOL. 48

Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs.
Tao Xiang ... Jielin Li
The FASEB Journal | VOL. 37
Tao Xiang, et. al.Tao Xiang ... Jielin Li
13 May 2023
The FASEB Journal | VOL. 37

Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass
Mi Luo ... Yujun Sun
Forests | VOL. 12
Mi Luo, et. al.Mi Luo ... Yujun Sun
13 Feb 2021
Forests | VOL. 12

Gut microbiota landscape and potential biomarker identification in female patients with systemic lupus erythematosus using machine learning.
Wenzhu Song ... Xueli Hu
Frontiers in cellular and infection microbiology | VOL. 13
Wenzhu Song, et. al.Wenzhu Song ... Xueli Hu
19 Dec 2023
Frontiers in cellular and infection microbiology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large-scale soybean dataset.

Abstract

Published Version

Talk to us

Similar Papers

More From: The plant genome