The use of class imbalanced learning methods on ULSAM data to predict the case–control status in genome-wide association studies

R Onur Öztornaci,Hamzah Syed,Andrew P Morris,Bahar Taşdelen

doi:10.1186/s40537-023-00853-x

Abstract

Machine learning (ML) methods for uncovering single nucleotide polymorphisms (SNPs) in genome-wide association study (GWAS) data that can be used to predict disease outcomes are becoming increasingly used in genetic research. Two issues with the use of ML models are finding the correct method for dealing with imbalanced data and data training. This article compares three ML models to identify SNPs that predict type 2 diabetes (T2D) status using the Support vector machine SMOTE (SVM SMOTE), The Adaptive Synthetic Sampling Approach (ADASYN), Random under sampling (RUS) on GWAS data from elderly male participants (165 cases and 951 controls) from the Uppsala Longitudinal Study of Adult Men (ULSAM). It was also applied to SNPs selected by the SMOTE, SVM SMOTE, ADASYN, and RUS clumping method. The analysis was performed using three different ML models: (i) support vector machine (SVM), (ii) multilayer perceptron (MLP) and (iii) random forests (RF). The accuracy of the case–control classification was compared between these three methods. The best classification algorithm was a combination of MLP and SMOTE (97% accuracy). Both RF and SVM achieved good accuracy results of over 90%. Overall, methods used against unbalanced data, all three ML algorithms were found to improve prediction accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The use of class imbalanced learning methods on ULSAM data to predict the case–control status in genome-wide association studies

Abstract

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Journal: Journal of Big Data	Publication Date: Nov 30, 2023
License type: CC BY 4.0

Similar Papers

Flood Forecasting for Himachal region using Machine Learning based classification
Bimalkant Lauhny ... Lalit Kumar Awasthi
-
Bimalkant Lauhny, et. al.Bimalkant Lauhny ... Lalit Kumar Awasthi
10 Feb 2023
10 Feb 2023

A Machine Learning–Based Prognostic Model for the Prediction of Early Death After Traumatic Brain Injury: Comparison with the Corticosteroid Randomization After Significant Head Injury (CRASH) Model
Sang Hyub Lee ... Dong Ho Kang
World Neurosurgery | VOL. 166
Sang Hyub Lee, et. al.Sang Hyub Lee ... Dong Ho Kang
03 Jul 2022
World Neurosurgery | VOL. 166

Risk Assessment of Hip Fracture Based on Machine Learning.
Alessio Galassi ... María José Rupérez
Applied Bionics and Biomechanics | VOL. 2020
Alessio Galassi, et. al.Alessio Galassi ... María José Rupérez
22 Dec 2020
Applied Bionics and Biomechanics | VOL. 2020

MR T1WI based radiomics and machine learning model for predicting the histopathological grades of soft tissue sarcomas
...
Chinese journal of radiology | VOL. 54
, et. al. ...
10 Apr 2020
Chinese journal of radiology | VOL. 54

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The use of class imbalanced learning methods on ULSAM data to predict the case–control status in genome-wide association studies

Abstract

Talk to us

Similar Papers

More From: Journal of Big Data