Information FOMO: The Unhealthy Fear of Missing Out on Information-A Method for Removing Misleading Data for Healthier Models.

Ethan Pickering,Themistoklis P Sapsis

doi:10.3390/e26100835

Abstract

Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset while ignoring data that are either misleading or bring unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data lead to worse performance and instabilities of the surrogate model, often termed sample-wise "double descent". We find these instabilities are a result of the complexity of the underlying map and are linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Information FOMO: The Unhealthy Fear of Missing Out on Information-A Method for Removing Misleading Data for Healthier Models.

Abstract

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Journal: Entropy (Basel, Switzerland)	Publication Date: Sep 30, 2024
License type: CC BY 4.0

Similar Papers

Genetic Programming and Gaussian Process Regression Models for Groundwater Salinity Prediction: Machine Learning for Sustainable Water Resources Management
Alvin Lal ... Bithin Datta
-
Alvin Lal, et. al.Alvin Lal ... Bithin Datta
01 Nov 2018
01 Nov 2018

Rheological modeling of marjoram fortified rice dough: Empirical and machine learning approach
Siddharth Vishwakarma ... Shubham Mandliya
Journal of Food Process Engineering | VOL. 46
Siddharth Vishwakarma, et. al.Siddharth Vishwakarma ... Shubham Mandliya
23 May 2023
Journal of Food Process Engineering | VOL. 46

Comparison and evaluation of advanced machine learning methods for performance and emissions prediction of a gasoline Wankel rotary engine
Huaiyu Wang ... Shuofeng Wang
Energy | VOL. 248
Huaiyu Wang, et. al.Huaiyu Wang ... Shuofeng Wang
28 Feb 2022
Energy | VOL. 248

Double-machine-learning-based data-driven stochastic flow stress model for aluminium alloys at elevated temperatures
Baixi Chen ... Wei Li
Materials Today Communications | VOL. 33
Baixi Chen, et. al.Baixi Chen ... Wei Li
01 Dec 2022
Materials Today Communications | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information FOMO: The Unhealthy Fear of Missing Out on Information-A Method for Removing Misleading Data for Healthier Models.

Abstract

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)