Abstract

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.

Highlights

  • We address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML)

  • A paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” [4] attracted a great deal of scientific and popular attention, in particular in conjunction with another paper [7] which made the headline-grabbing claim that the Neutral Theory of Molecular Evolution [8] needs to be rejected because it was based on “unreliable theoretical and empirical evidence from the beginning.”

  • The immense success that SML techniques have had in the world of e-commerce [10], as well as in some areas of medicine [11,12], genetics [13], genomics [14], and biochemistry [15] may have prompted Schrider and Kern to develop so-called supervised machine learning methodologies [3] to address evolutionary questions, those aimed to clarify the relative importance of selection and random genetic drift during the evolution of genomes [4]

Read more

Summary

Introduction

A series of articles by Daniel Schrider and Andrew Kern [3,4,5,6] were important in this metamorphosis Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” [4] attracted a great deal of scientific and popular attention, in particular in conjunction with another paper [7] which made the headline-grabbing claim that the Neutral Theory of Molecular Evolution [8] needs to be rejected because it was based on “unreliable theoretical and empirical evidence from the beginning.”. Before further dissecting Schrider and Kern [4], we outline the principles of machine learning and some features of selective sweeps

Principles and Limitations of Machine Learning
What Are Hard and Soft Selective Sweeps?
The Supervised Machine Learning Algorithm That Isn’t
Other Questionable Practices
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call