Hands-on training about overfitting.

Janez Demšar,Blaž Zupan,Patricia M Palagi

doi:10.1371/journal.pcbi.1008671

Janez Demšar, Blaž Zupan + Show 1 more

Open Access

https://doi.org/10.1371/journal.pcbi.1008671

Copy DOI

Abstract

Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis.

Highlights

Machine learning is one of the critical bioinformatics technologies [1]
We aim to expose machine learning to practitioners– computer scientists and molecular biologists and students of biomedicine, that is, the end-users of bioinformatics’ computational approaches
We propose a hands-on approach that uses an open-source workflow-based data science toolbox that combines data visualization and machine learning

Summary

Introduction

Applications of machine learning span across the entire spectra of molecular biology research, from genomics, proteomics and gene expression analysis to development of predictive models through largescale integration [2]. Overfitting is perhaps the most serious mistake one can make in machine learning In his excellent review, Simon et al [4] points out that a substantial number of the most prominent early publications on gene expression analysis overfitted the data when reporting predictive or clustering models. Simon et al [4] points out that a substantial number of the most prominent early publications on gene expression analysis overfitted the data when reporting predictive or clustering models Mistakes of these kinds are today rare, yet the problem with overfitting persists [3,5]. It is essential to convey the intricacies and facets of overfitting to students that are taught about data science, and we should include lectures on overfitting already within introductory courses of machine learning

Objectives

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Computational Biology	Publication Date: Mar 4, 2021
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Hands-on training about overfitting.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Poster: Practical Examples of Basic Data Science Course for Junior High and High School Students in Club Activity
Satoshi Fujishima ... Akiyuki Minamide
-
Satoshi Fujishima, et. al.Satoshi Fujishima ... Akiyuki Minamide
01 Jan 2021
01 Jan 2021

Interdisciplinary Computing Education: An Introductory Programming and Data Science Course for Postdoctoral Researchers in the Biosciences
Arko Barman ... Yasmin Chebaro
-
Arko Barman, et. al.Arko Barman ... Yasmin Chebaro
08 Oct 2022
08 Oct 2022

Teaching Data Science Programming Skills to Diverse Student Cohorts
Leon E Burger
-
Leon E BurgerLeon E Burger
27 Nov 2022
27 Nov 2022

Using summary tables to introduce principal component analysis in an elementary data science course
Jon‐Paul Paolino
Teaching Statistics | VOL. 46
Jon‐Paul PaolinoJon‐Paul Paolino
03 Jun 2024
Teaching Statistics | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hands-on training about overfitting.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology