Abstract

Detecting outliers in contingency table is an interesting statistical problem and it poses additional difficulties due to the polarization of cell counts. The fundamental definition of 'markedly deviant' cell as an outlier is clearly exploited in this study by introducing a pivot element to capture the deviations. The present study considers a two-step confirmatory procedure to detect outliers in I x J contingency table. The procedure deals with (i) identifying the reliable set of candidate outliers using the deviation from the pivot element and then (ii) detect those set of outlying cells by examining different type of residuals of the suitable fitted model. The robustness of the procedure is investigated through a simulation study along with applications to real datasets.

Highlights

  • In recent years, a great deal of attention has been paid to the accommodation and identification of unusual observations in the data

  • Diagnostics in I × J contingency table has drawn a great deal of attention by the statisticians for many years but the notion of outliers is not well defined

  • A two phase objective is devised with the identification of pivot element to examine their deviations and a confirmatory approach to identify the outliers a model based diagnostics

Read more

Summary

Introduction

A great deal of attention has been paid to the accommodation and identification of unusual observations (outliers) in the data. Until now research on outliers in I ×J contingency tables has been restricted mainly to the study on independence. Rapallo, and Rehage (2014) detected outliers through subsets of cell counts called minimal patterns for the independence model. This study presents an alternative approach to detect outliers based on the assumption of model independence. The structure and nature of cell counts in a contingency table play an important role in the data analysis with the cell counts ranging from zero to very high frequencies (Sangeetha, Subbiah, Srinivasan, and Nandram (2014)). Subbiah and Srinivasan (2008) on the sensitivity analysis of 2 × 2 tables, location of polarized counts in the table pose additional challenge in the detection of outliers. The model based diagnostics is used to obtain the results followed by boxplots to confirm the outlying cells

Proposed method
Simulation study
Student’s enrolment data
Artificial data
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call