Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications.

Hoseung Song,Bhavya Kailkhura,Jayaraman J Thiagarajan

doi:10.3389/frai.2021.589632

Abstract

Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes.

Highlights

Most machine learning (ML) applications, e.g., healthcare, drug-discovery, etc., encounter dataset shift when operating in the real-world
Dataset shift is a frequent cause of failure of ML systems, very few ML systems inspect incoming data for a potential distribution shift (Bulusu et al, 2020)
We investigate the problem of detecting distribution shifts in graph-structured datasets for responsible deployment of ML in safety-critical applications

Summary

INTRODUCTION

Most machine learning (ML) applications, e.g., healthcare, drug-discovery, etc., encounter dataset shift when operating in the real-world. Shervashidze et al (2009) used the earth mover’s distance between the distributions of feature summaries of their constituent subgraphs While these heuristic methods are reasonably effective for comparing real-world graphs, not until recently that a principled analysis of hypothesis testing with random graphs was carried out. Ghoshdastidar and von Luxburg (2018) developed a novel testing framework for random graphs, for the cases with small sample sizes and the large number of nodes, and studied its optimality This test statistic was based on the asymptotic null distributions under certain model assumptions. In order to circumvent these crucial shortcomings, we develop a novel approach based on hypothesis testing for detecting shifts in graph-structured data, which is more flexible (i.e., accommodates 1) both undirected and directed graphs and 2) unequal sample size cases) It is highly effective even when the sample size grows. In order to demonstrate the usefulness of the proposed method in challenging real-world problems, we consider several applications (including a healthcare application), and show the effectiveness of our approach

PRELIMINARIES

PROPOSED TEST

Simulated Data

Real-World Applications

DATA AVAILABILITY STATEMENT

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in artificial intelligence	Publication Date: May 18, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence

Lead the way for us

Similar Papers

Dataset Shift in Machine Learning
Neil D Lawrence
-
Neil D LawrenceNeil D Lawrence
12 Dec 2008
12 Dec 2008

Dataset Shift Detection in Non-stationary Environments Using EWMA Charts
Haider Raza ... Yuhua Li
-
Haider Raza, et. al.Haider Raza ... Yuhua Li
01 Oct 2013
01 Oct 2013

EWMA Based Two-Stage Dataset Shift-Detection in Non-stationary Environments
Haider Raza ... Yuhua Li
-
Haider Raza, et. al.Haider Raza ... Yuhua Li
01 Jan 2013
01 Jan 2013

Learning Latent Features using Stochastic Neural Networks on Graph Structured Data

-

01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence