Detection of Outlying Cells in Contingency Tables Using Model Based Diagnostics

Thodur P Sripriya,Mamandur R Srinivasan,Michele Gallo

doi:10.17713/ajs.v49i5.938

Thodur P Sripriya, Mamandur R Srinivasan + Show 1 more

Open Access

https://doi.org/10.17713/ajs.v49i5.938

Copy DOI

Abstract

Detecting outliers in contingency table is an interesting statistical problem and it poses additional difficulties due to the polarization of cell counts. The fundamental definition of 'markedly deviant' cell as an outlier is clearly exploited in this study by introducing a pivot element to capture the deviations. The present study considers a two-step confirmatory procedure to detect outliers in I x J contingency table. The procedure deals with (i) identifying the reliable set of candidate outliers using the deviation from the pivot element and then (ii) detect those set of outlying cells by examining different type of residuals of the suitable fitted model. The robustness of the procedure is investigated through a simulation study along with applications to real datasets.

Highlights

In recent years, a great deal of attention has been paid to the accommodation and identification of unusual observations in the data
Diagnostics in I × J contingency table has drawn a great deal of attention by the statisticians for many years but the notion of outliers is not well defined
A two phase objective is devised with the identification of pivot element to examine their deviations and a confirmatory approach to identify the outliers a model based diagnostics

Summary

Introduction

A great deal of attention has been paid to the accommodation and identification of unusual observations (outliers) in the data. Until now research on outliers in I ×J contingency tables has been restricted mainly to the study on independence. Rapallo, and Rehage (2014) detected outliers through subsets of cell counts called minimal patterns for the independence model. This study presents an alternative approach to detect outliers based on the assumption of model independence. The structure and nature of cell counts in a contingency table play an important role in the data analysis with the cell counts ranging from zero to very high frequencies (Sangeetha, Subbiah, Srinivasan, and Nandram (2014)). Subbiah and Srinivasan (2008) on the sensitivity analysis of 2 × 2 tables, location of polarized counts in the table pose additional challenge in the detection of outliers. The model based diagnostics is used to obtain the results followed by boxplots to confirm the outlying cells

Proposed method

Simulation study

Student’s enrolment data

Artificial data

Findings

Conclusions