Distributions-free Martingales Test Distributions-shift

Zepu Xi,Hongbo Chen,Xiaoqian Chen,Wen Yao

doi:10.1016/j.procs.2023.08.153

Abstract

A standard assumption of the theory of machine learning is the data are generated from a fixed but unknown probability distribution. Although this assumption is based on the foundations of the theory of probability, however, for most learning problems we usually technically random shuffle the original datasets, such as random split into training and test datasets before the training model, to satisfy the assumption, and then we use the shuffled training dataset to train a machine learning model. However, honestly, for real-life learning applications, the data pairs are observed batch by batch under their own original order and it is not necessary to randomly shuffle the original order in advance. From a mathematical point of view, we test if the random shuffling will play a non-negligible influence on the generalization of learning machines. We reduce the problem of random shuffling into the problem of distribution-shift detection.This paper is devoted to testing the null hypothesis that random shuffling does not affect the generalization of learning machines and introduces a distributions-free martingales method against the null hypothesis. We report the five real-life benchmarks of experimental performance with the help of Support Vector Machines and a multi-layer perceptron model. The results show a bonafide fact that the distribution shift in itself of the data is an inescapable reality when we build machine learning algorithms as the original order.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributions-free Martingales Test Distributions-shift

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Journal: Procedia Computer Science	Publication Date: Jan 1, 2023
License type: cc-by-nc-nd

Similar Papers

Construction of a Machine Learning Dataset through Collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge.
Adam E Flanders ...
Radiology. Artificial intelligence | VOL. 2
Adam E Flanders, et. al.Adam E Flanders ...
29 Apr 2020
Construction of a Machine Learning Dataset through Collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge.
Adam E Flanders ...

Phasor-Measurement-Unit-Based Data Analytics Using Digital Twin and PhasorAnalytics Software
Philip Hart ... Lijun He
-
Philip Hart, et. al.Philip Hart ... Lijun He
28 Dec 2021
28 Dec 2021

Application of Machine Learning Techniques for the Estimation of the Safety Factor in Slope Stability Analysis
Yaser Ahangari Nanehkaran ... Tolga Pusatli
Water | VOL. 14
Yaser Ahangari Nanehkaran, et. al.Yaser Ahangari Nanehkaran ... Tolga Pusatli
18 Nov 2022
Water | VOL. 14

Clinical Application of Artificial Intelligence in Patients with Chronic Myeloid Leukemia in Chronic Phase
Koji Sasaki ... Jorge E Cortes
Blood | VOL. 128
Koji Sasaki, et. al.Koji Sasaki ... Jorge E Cortes
02 Dec 2016
Blood | VOL. 128

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributions-free Martingales Test Distributions-shift

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science