Understanding the concept of outlier and its relevance to the assessment of data quality: Probabilistic background theory

Davaadorjin Monhor,Shuzo Takemoto

doi:10.1186/bf03351881

Abstract

Abstract In recent years an increasing interest in the studies on outlier can be observed, however, for the time being there exists no general definition of outlier. In the present paper we introduced a generic descriptive definition of outlier. We observed that the outlier problems had so far been treated in statistical way without paying proper attention to probabilistic-theoretic backgrounds. In view of this gap, we made an attempt to establish a probabilistic background theory. Within this framework, the large deviations are considered as probabilistic-theoretic model of outlier, and the interrelationship of the laws of large numbers, the central limit theorems and the large deviations are clarified. These considerations are specialized for the case of statistical sample, which is important from the point of view of the assessment of data quality. Some methodological and historical aspects of geodesy, geophysics and astronomy are mentioned, too. We revealed that the data analysis carried out by Kepler in the process of discovery of his famous elliptic law of planetary motion has relevance to the outlier problem. This methodologically interesting fact is a new result in the history of geosciences. We established that the accuracy of Chebyshev inequality increases as the deviation of the random variable involved from its expectation, increases. The possibility of application of Chebyshev inequality to the outlier problem is pointed out.

Highlights

The large deviations are considered as probabilistic-theoretic model of outlier
To determine the mean ellipticity of the Earth from measurements data, Maire and Boscovich (1755), and Boscovich (1757) removed two data on the basis that they were too much deviated from the remaining data. This means that Maire and Boscovich used outlier rejection technique
Instead carefully compiled mortality tables which are nothing other than numerical realization of a probability distribution are used. These three examples have a common feature that the quantities in question are ab initio kinds of “natural random variables”, and the “fluctuations of considerable size”, i.e., outliers are apparent to be an intrinsic feature of the randomness

Summary

New Findings and Lessons

The concept of outlier stemmed from the mathematically processing geodetic and astronomical measurements data. Instead carefully compiled mortality tables which are nothing other than numerical realization of a probability distribution are used These three examples have a common feature that the quantities in question are ab initio kinds of “natural random variables”, and the “fluctuations of considerable size”, i.e., outliers are apparent to be an intrinsic feature of the randomness. Through these illustrative examples we clearly see that these outliers are neither “the gross error” nor “the error arising from model fitting”

The Asymptotic Growth Rate of Functions and Related Asymptotic Notations

The Central Limit Theorem as Quantitative

The Large Deviations as ProbabilisticTheoretic Models for Outliers

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Earth, Planets and Space	Publication Date: Nov 1, 2005
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Understanding the concept of outlier and its relevance to the assessment of data quality: Probabilistic background theory

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Earth, Planets and Space

Lead the way for us

Similar Papers

Boundary-Value Problems for Random Walks and Large Deviations in Function Spaces
A A Borovkov
Theory of Probability & Its Applications | VOL. 12
A A BorovkovA A Borovkov
01 Jan 1967
Theory of Probability & Its Applications | VOL. 12

Some Limit Theorems for Large Deviations
S V Nagaev
Theory of Probability & Its Applications | VOL. 10
S V NagaevS V Nagaev
01 Jan 1964
Theory of Probability & Its Applications | VOL. 10

On the Rate of Convergence in the Central Limit Theorem in Certain Banach Spaces
V I Paulauskas
Theory of Probability & Its Applications | VOL. 21
V I PaulauskasV I Paulauskas
01 Sep 1977
Theory of Probability & Its Applications | VOL. 21

Probabilities of Large Deviations on the Whole Axis
L V Rozovskii
Theory of Probability & Its Applications | VOL. 38
L V RozovskiiL V Rozovskii
01 Jan 1993
Theory of Probability & Its Applications | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Understanding the concept of outlier and its relevance to the assessment of data quality: Probabilistic background theory

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Earth, Planets and Space