ABSTRACT Outlier detection is an important task in data-driven geotechnics. It is noted that many existing methods for outlier detection (e.g. Bayesian learning, neural networks) may pose computational challenge for conventional geotechnical practitioners. Towards this aspect, this study proposes a simple, fast, and explainable method for detecting outliers in geotechnical database. The principle of the method is that for a data point to be labelled as a potential outlier, the probability of observing a data point as extreme as that value should be low. To account for outliers in left-tail and right-tail, the skewness of the dataset is incorporated, and an indicator referred to as outlier score ( > 0 ) is assigned to each data point. The method also provides another indicator (referred to as dimensional outlier score) to identify which dimensions/soil properties contribute to the outlierness; hence, it is explainable. The method doesn't require any time-consuming learning or sampling procedures; hence, it is quite fast and practitioner-friendly. Multiple numerical examples are utilised to demonstrate the capability first. Finally, four publicly available geotechnical databases are utilised to demonstrate the outlier detection task. The results suggest that the outliers identified using the proposed method can be meaningful from a geotechnical point of view.
Read full abstract