Assessing the reproducibility of machine-learning-based biomarker discovery in Parkinson’s disease

Ali Ameli,Lourdes Peña-Castillo,Hamid Usefi

doi:10.1016/j.compbiomed.2024.108407

Abstract

Feature selection and machine learning algorithms can be used to analyze Single Nucleotide Polymorphisms (SNPs) data and identify potential disease biomarkers. Reproducibility of identified biomarkers is critical for them to be useful for clinical research; however, genotyping platforms and selection criteria for individuals to be genotyped affect the reproducibility of identified biomarkers. To assess biomarkers reproducibility, we collected five SNPs datasets from the database of Genotypes and Phenotypes (dbGaP) and explored several data integration strategies. While combining datasets can lead to a reduction in classification accuracy, it has the potential to improve the reproducibility of potential biomarkers. We evaluated the agreement among different strategies in terms of the SNPs that were identified as potential Parkinson’s disease (PD) biomarkers. Our findings indicate that, on average, 93% of the SNPs identified in a single dataset fail to be identified in other datasets. However, through dataset integration, this lack of replication is reduced to 62%. We discovered fifty SNPs that were identified at least twice, which could potentially serve as novel PD biomarkers. These SNPs are indirectly linked to PD in the literature but have not been directly associated with PD before. These findings open up new potential avenues of investigation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assessing the reproducibility of machine-learning-based biomarker discovery in Parkinson’s disease

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine

Lead the way for us

Journal: Computers in Biology and Medicine	Publication Date: Apr 5, 2024
License type: cc-by

Similar Papers

Multi objective SNP selection using pareto optimality
Ergun Gumus ... Olcay Kursun
Computational Biology and Chemistry | VOL. 43
Ergun Gumus, et. al.Ergun Gumus ... Olcay Kursun
31 Dec 2013
Computational Biology and Chemistry | VOL. 43

Genome-wide analysis of 10664 SARS-CoV-2 genomes to identify virus strains in 73 countries based on single nucleotide polymorphism
Nimisha Ghosh ... Dariusz Plewczynski
Virus Research | VOL. 298
Nimisha Ghosh, et. al.Nimisha Ghosh ... Dariusz Plewczynski
26 Mar 2021
Virus Research | VOL. 298

SNP HiTLink: a high-throughput linkage analysis system employing dense SNP data
Yoko Fukuda ... Hidetoshi Date
BMC Bioinformatics | VOL. 10
Yoko Fukuda, et. al.Yoko Fukuda ... Hidetoshi Date
24 Apr 2009
BMC Bioinformatics | VOL. 10

The impact of single nucleotide polymorphism selection on prediction of genomewide breeding values
Kacper Żukowski ... Joanna Szyda
BMC Proceedings | VOL. 3
Kacper Żukowski, et. al.Kacper Żukowski ... Joanna Szyda
23 Feb 2009
BMC Proceedings | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing the reproducibility of machine-learning-based biomarker discovery in Parkinson’s disease

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine