Covariate Shift: A Review and Analysis on Classifiers

Geeta Dharani Y,Pallavi Satpathy,Nimisha G Nair,Jabez Christopher

doi:10.1109/gcat47503.2019.8978471

Abstract

Training and testing are the two phases of a supervised machine learning model. When these models are trained, validated and tested, it is usually assumed that the test and train data points follow the same distribution. However, practical scenarios are much different; in real world, the joint distribution of inputs to the model and outputs of the model differs between training and test data, which is called dataset shift. A simpler case of dataset shift, where only the input distribution changes and the conditional distribution of the output for a given input remains unchanged is known as covariate shift. This article primarily provides an overview of the existing methods of covariate shift detection and adaption and their applications in the real world. It also gives an experimental analysis of the effect of various covariate shift adaptation techniques on the performance of classification algorithms for four datasets which include, synthetic and real-world data. Performance of machine learning models show significant improvement after covariate shift was handled using Importance Reweighting method and feature-dropping method. The review, experimental analysis, and observations of this work may serve as guidelines for researchers and developers of machine learning systems, to handle covariate shift problems efficiently.

Full Text