Abstract

Abstract. Validation of probabilistic models based on goodness-of-fit tests is an essential step for the frequency analysis of extreme events. The outcome of standard testing techniques, however, is mainly determined by the behavior of the hypothetical model, FX(x), in the central part of the distribution, while the behavior in the tails of the distribution, which is indeed very relevant in hydrological applications, is relatively unimportant for the results of the tests. The maximum-value test, originally proposed as a technique for outlier detection, is a suitable, but seldom applied, technique that addresses this problem. The test is specifically targeted to verify if the maximum (or minimum) values in the sample are consistent with the hypothesis that the distribution FX(x) is the real parent distribution. The application of this test is hindered by the fact that the critical values for the test should be numerically obtained when the parameters of FX(x) are estimated on the same sample used for verification, which is the standard situation in hydrological applications. We propose here a simple, analytically explicit, technique to suitably account for this effect, based on the application of censored L-moments estimators of the parameters. We demonstrate, with an application that uses artificially generated samples, the superiority of this modified maximum-value test with respect to the standard version of the test. We also show that the test has comparable or larger power with respect to other goodness-of-fit tests (e.g., chi-squared test, Anderson-Darling test, Fung and Paul test), in particular when dealing with small samples (sample size lower than 20–25) and when the parent distribution is similar to the distribution being tested.

Highlights

  • An outlying observation, or outlier, is a record that appears to deviate markedly from other members of the sample to which it belongs (Grubbs, 1969)

  • We show that the test has comparable or larger power with respect to other goodness-of-fit tests, in particular when dealing with small samples and when the parent distribution is similar to the distribution being tested

  • While application of outlier detection methods may be extremely important for screening the data and recognizing gross errors, unsupervised outlier rejection may result in a remarkable loss of information, in particular when the behavior of the tails of the distribution is fundamental to the performed statistical analyses

Read more

Summary

Introduction

Outlier, is a record that appears to deviate markedly from other members of the sample to which it belongs (Grubbs, 1969). A common drawback of goodness-of-fit and model selection techniques is that their outcome is mainly determined by the behavior of the hypothetical model in the central part of the distribution, while the behavior in the tails of the distribution, is relatively unimportant for the outcome of the test: standard goodness-of-fit tests seldom reveal an ill-fitting tail without a very large amount of data (Bryson, 1974) This problem can be overcome by using the maximumvalue test, which was originally proposed by Grubbs (1969) as a technique for outliers detection in a Gaussian setting, and subsequently extended to Gumbel-distributed parents by Rossi et al (1984). Usual applications of this test to non-Gaussian distributions are complicated by the fact that the parameters of the hypothetical distribution, FX(x), are unknown and need to be estimated using the same sample used for the test, which in turn implies that the acceptance region for the test needs to be calculated through numerical simulation (see e.g. Rossi et al, 1984)

Methods
Basic definitions
Modified maximum-value test
Assessment of the power of the test through synthetic data
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.