Abstract

The standard boxplot is one of the most popular nonparametric tools for detecting outliers in univariate datasets. For Gaussian or symmetric distributions, the chance of data occurring outside of the standard boxplot fence is only 0.7%. However, for skewed data, such as telemetric rain observations in a real-time flood forecasting system, the probability is significantly higher. To overcome this problem, a medcouple (MC) that is robust to resisting outliers and sensitive to detecting skewness was introduced to construct a new robust skewed boxplot fence. Three types of boxplot fences related to MC were analyzed and compared, and the exponential function boxplot fence was selected. Operating on uncontaminated as well as simulated contaminated data, the results showed that the proposed method could produce a lower swamping rate and higher accuracy than the standard boxplot and semi-interquartile range boxplot. The outcomes of this study demonstrated that it is reasonable to use the new robust skewed boxplot method to detect outliers in skewed rain distributions.

Highlights

  • More real-time flood forecasting systems in China, in the large basins where there are many remote gauge stations, use telemetry systems to transmit the rainfall signals of rainfall stations because telemetry systems can provide timely, dense, and labor-saving hydrological information for remote rainfall stations [1]

  • It has been shown that telemetric rainfall information includes inevitable outliers caused by instrument malfunction, human-related errors, and/ or signal acquisition errors resulting from signal leaks, and collisions or disturbances in the process of signal transmission, in addition to random errors normally distributed with zero mean and a small variance [1,2,3]. e outliers have an unknown distribution with a much greater variance and appear to be inconsistent with the remainder of the dataset and are relatively large in magnitude [4, 5]. erefore, outliers should be treated differently [6], and in this paper, observations containing outliers were called abnormal data

  • An observation is considered as “potential” abnormal data when its value does not belong to the interval: (q1 − 1.5 ∗ IQR, q3 + 1.5 ∗ IQR), where q1 and q3 are the first and third quartiles, respectively, and IQR is the interquartile range, i.e., IQR q3–q1. e standard boxplot is fitted to normal or symmetric distributions in particular

Read more

Summary

Introduction

More real-time flood forecasting systems in China, in the large basins where there are many remote gauge stations, use telemetry systems to transmit the rainfall signals of rainfall stations because telemetry systems can provide timely, dense, and labor-saving hydrological information for remote rainfall stations [1]. E SIQR boxplot has been applied to the real hourly rain observations from the Wuyigong rain gauge. E SIQR boxplot adjusts itself to the right skewness, compared with the standard method (Figure 2). The functions in these methods depend on the sample size, and the procedures require some characteristics of the uncontaminated distribution, which is often di cult to estimate for the real-time hourly rainfall datasets. E new method was independent of the sample size and performs well with the rainfall distribution It can reduce swamping and rapidly detected abnormal telemetric rainfall data before they were entered into the real-time ood forecasting model. 40 30 20 10 0 Figure 1: e standard boxplot of hourly rainfall datasets from Wuyigong Station. SIQR e fences of the two boxplot methods for Wuyigong well as abnormal datasets.

Materials and Methods
Methods
Results and Discussion
Method
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call