Abstract
Outliers are typically identified using frequentist methods. The data are classified as “outliers” or “not outliers” based on a test statistic that measures the magnitude of the difference between a value and the majority part of the data. The threshold for a data value to be an outlier is typically defined by the user. However, a subjective choice of the threshold increases the uncertainty associated with outlier status for each data value. A cellwise outlier detection algorithm named FuzzyHRT is used to automate the editing process in repeated surveys. This algorithm uses Bienaymé–Chebyshev’s inequality and fuzzy logic to detect four different types of outliers resulting from format inconsistencies, historical, tail, and relational anomalies. However, fuzzy logic is not suited for probabilistic reasoning behind the identification of anomalous cells. Bayesian methods are well suited for quantifying the uncertainty associated with the identification of outliers. Although, as suggested by the literature, there exist well-developed Bayesian methods for record-level outlier detection, Bayesian methods for identifying outliers within individual records (i.e., at the cell level) remain unexplored. This paper presents two approaches from the Bayesian perspective to study the uncertainty associated with identifying outliers. A Bayesian bootstrap approach is explored to study the uncertainty associated with the output scores from the FuzzyHRT algorithm. Empirical likelihoods in a Bayesian setting are also considered for probabilistic reasoning behind the identification of anomalous cells. NASS survey data for livestock and major crop yield (such as corn) are considered for comparing the performances of the two proposed approaches with recent cellwise outlier methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.