A Machine-Learning-Based Approach to Prediction of Biogeographic Ancestry within Europe.

Anna Kloska,Krzysztof Pałczyński,Slobodan Davidović,Magdalena Spólnicka,Marcin Woźniak,Miroslava V Derenko,Tomasz Marciniak,Tomasz Grzybowski,Rafał Płoski,Sylwester M Kloska,Boris A Malyarchuk,Agata Giełczyk,Nataša Kovačević-Grujičić,Danijela Drakulić,Magdalena Zubańska,Milena Stevanović,Urszula Rogalla-Ładniak

doi:10.3390/ijms242015095

Abstract

Data obtained with the use of massive parallel sequencing (MPS) can be valuable in population genetics studies. In particular, such data harbor the potential for distinguishing samples from different populations, especially from those coming from adjacent populations of common origin. Machine learning (ML) techniques seem to be especially well suited for analyzing large datasets obtained using MPS. The Slavic populations constitute about a third of the population of Europe and inhabit a large area of the continent, while being relatively closely related in population genetics terms. In this proof-of-concept study, various ML techniques were used to classify DNA samples from Slavic and non-Slavic individuals. The primary objective of this study was to empirically evaluate the feasibility of discerning the genetic provenance of individuals of Slavic descent who exhibit genetic similarity, with the overarching goal of categorizing DNA specimens derived from diverse Slavic population representatives. Raw sequencing data were pre-processed, to obtain a 1200 character-long binary vector. A total of three classifiers were used-Random Forest, Support Vector Machine (SVM), and XGBoost. The most-promising results were obtained using SVM with a linear kernel, with 99.9% accuracy and F1-scores of 0.9846-1.000 for all classes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Molecular Sciences	Publication Date: Oct 11, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Machine-Learning-Based Approach to Prediction of Biogeographic Ancestry within Europe.

Abstract

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences

Lead the way for us

Similar Papers

Review of Machine and Deep Learning Techniques in Epileptic Seizure Detection using Physiological Signals and Sentiment Analysis
Deba Prasad Dash ... Mohammad R Khosravi
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Deba Prasad Dash, et. al.Deba Prasad Dash ... Mohammad R Khosravi
15 Jan 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

COVID‐19: A systematic review of prediction and classification techniques
Om Ramakisan Varma ... Mala Kalra
International Journal of Imaging Systems and Technology | VOL. 33
Om Ramakisan Varma, et. al.Om Ramakisan Varma ... Mala Kalra
11 May 2023
International Journal of Imaging Systems and Technology | VOL. 33

Prediction of oil and gas pipeline failures through machine learning approaches: A systematic review
Abdulnaser M Al-Sabaeei ... Ajayshankar Jagadeesh
Energy Reports | VOL. 10
Abdulnaser M Al-Sabaeei, et. al.Abdulnaser M Al-Sabaeei ... Ajayshankar Jagadeesh
16 Aug 2023
Energy Reports | VOL. 10

Computational Intelligence, Machine Learning and Deep Learning Techniques for Effective Future Predictions of COVID-19: A Review
K Aditya Shastry ... H A Sanjay
-
K Aditya Shastry, et. al.K Aditya Shastry ... H A Sanjay
28 Jul 2021
28 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Machine-Learning-Based Approach to Prediction of Biogeographic Ancestry within Europe.

Abstract

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences