Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Modelling Shanghai soil properties with finite mixtures of $$S_\text {U}$$ Johnson distributions

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Abstract The presence of asymmetry in geotechnical data necessitates the use of advanced techniques to handle skewness and kurtosis. A considerable amount of statistical literature has been developed over the years for such scenarios. Techniques ranging from transformations to heavy-tailed distributions, these tools and frameworks have been adapted to model a variety of geotechnical phenomena. At its essence, soil data is heterogeneous while also being asymmetric, posing challenges from a modelling perspective. Adopting an unsupervised learning paradigm, mixture model-based approach has shown great efficacy for modelling such scenarios. In particular, the use of transformations within a model-based framework has proven to be effective in dealing with skewed data. Despite the popularity of transformation techniques, there is a general paucity within the literature regarding the $$S_\text {U}$$ Johnson distribution. An alternative to the popularized power transformation, the $$S_\text {U}$$ Johnson distribution has been shown within geotechnical applications to have superior performance overall. In this work, we develop a mixture model-based approach for modelling incomplete and asymmetric soil data using finite mixtures of multivariate $$S_\text {U}$$ distributions. Additionally, we also develop an imputation method to handle missing data scenarios. Using Shanghai soil data, our method proves itself highly robust in the presence of heterogeneity, and asymmetry.

Similar Papers
  • PDF Download Icon
  • Research Article
  • 10.31481/uhmj.16.2015.08
Universal families of Johnson distributions and their use for analysis of time series of surface wind speed
  • Oct 29, 2017
  • Ukrainian hydrometeorological journal
  • G.P Ivus + 3 more

Introduction. During the last decades in connection with rapid development of numerical methods of weather forecasting insufficient attention is given physical and statistical regularities. Nevertheless, climate change and its implications for the various sectors of the economy requires information about the probability characteristics of meteorological variables and phenomena, including wind anomaly. In the article it was considered experience of application Johnson′s distributions to equalize time series of surface wind speed in the meteorological station of Odessa-port in the central months of the seasons. Were found a number of regularities that take into account not only the seasonal and diurnal variation of parameters this distribution, but also the impact of physical and geographical conditions of the location meteorological station on the formation of surface wind regime.
 The purpose of publication is to substantiation application of Johnson′s law to approximate series of wind speed at the surface on the meteorological station Odessa-port.
 Methods and results. To describe the experimental data in various analytical models of the distribution law increasingly applied the family of Johnson's distributions. Its advantage compared to the distribution of the Pearson consists in the fact, that after some transformations, it leads to a normally distributed random variable. Approximation methods based on universal families of distributions provide flexibility solving the problem of alignment of distributions. The most common approaches to the construction of universal families are approaches based on the method of moments, and the replacement of the original sample the other, the distribution of which is the standard. Statistics wind is presented by following parameters: average values of wind speed, standard deviations, coefficients of asymmetry, excess, coefficient of variation and their error. Conducted alignment time series of surface wind speed using Johnson's distribution for station Odessa-port during a period 1981-1990 y.y., which managed to pick up when ε from -0.51 to -8.00. The parameter λ, which determines the scale of change of the random variable seasonal ranges from 63.56 in January (18 UTC) to 15.77 in October (18 UTC). Estimating shape parameters of wind speed curves η and γ, can reveal some features of the surface wind regime at the st. Odessa port during the year. The less γ, the less slope of the curves. The values of η and γ varies within 0,82-3,54 and 0,24-4,81, respectively. In all cases, λ > 1, indicating that the family of curves belonging SL. The values of Q, which vary from 0.01 to 0.07, confirm the possibility of equalization the series of wind speed at the st. Odessa-port, Johnson's distribution family of SL.
 Conclusion. For unimodal distributions of time series wind speed at the meteorological station Odessa-port in almost all cases, possible to use the universal distribution of the Johnson's family SL. The parameters of this distribution allow to reveal regularities, that take into account impact of physical and geographical conditions of the location stations on the formation of surface wind regime.

  • Research Article
  • Cite Count Icon 13
  • 10.1002/qre.1116
The joint economic‐statistical design of X and R charts for nonnormal data
  • Mar 24, 2011
  • Quality and Reliability Engineering International
  • Huifen Chen + 1 more

We consider the joint economic‐statistical design of X and R control charts under the assumption that the quality measurement and the in‐control time have Johnson and Weibull distributions. The Johnson distribution is general in that it can be made to fit all possible values of skewness and kurtosis. The four parameters—the sample size n, time h between successive samples, and the control factors k1 and k2 for the X and R charts—are determined so that the mean hourly loss‐cost is minimized under constraints on the Type I and II error probabilities. We have generalized the Costa model to accommodate the Johnson and Weibull distributions. Sensitivity to nonnormality, shift, and Weibull scale parameter is considered in our analysis. Our sensitivity analysis shows that the optimal design parameters are sensitive to nonnormality. Comparisons of the fully economic and economic‐statistical designs are given. Copyright © 2010 John Wiley & Sons, Ltd.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.36001/ijphm.2015.v6i4.2291
Automatic Threshold Setting and Its Uncertainty Quantification in Wind Turbine Condition Monitoring System
  • Nov 3, 2020
  • International Journal of Prognostics and Health Management
  • Kun S Marhadi + 1 more

Setting optimal alarm thresholds in vibration based condition monitoring system is inherently difficult. There are no established thresholds for many vibration based measurements. Most of the time, the thresholds are set based on statistics of the collected data available. Often times the underlying probability distribution that describes the data is not known. Choosing an incorrect distribution to describe the data and then setting up thresholds based on the chosen distribution could result in sub-optimal thresholds. Moreover, in wind turbine applications the collected data available may not represent the whole operating conditions of a turbine, which results in uncertainty in the parameters of the fitted probability distribution and the thresholds calculated. In this study, Johnson, Normal, and Weibull distributions are investigated; which distribution can best fit vibration data collected from a period of time. False alarm rate resulted from using threshold determined from each distribution is used as a measure to determine which distribution is the most appropriate. This study shows that using Johnson distribution can eliminate testing or fitting various distributions to the data, and have more direct approach to obtain optimal thresholds. To quantify uncertainty in the thresholds due to limited data, implementations with bootstrap method and Bayesian inference are investigated.

  • Book Chapter
  • 10.1108/s1571-0386(2012)0000022018
Estimation of Non-Gaussian Returns: The Hedge Funds Case
  • Nov 19, 2012
  • Naceur Naguez + 1 more

Purpose – The purpose of this chapter is to estimate non-Gaussian distributions by means of Johnson distributions. An empirical illustration on hedge fund returns is detailed. Methodology/approach – To fit non-Gaussian distributions, the chapter introduces the family of Johnson distributions and its general extensions. We use both parametric and non-parametric approaches. In a first step, we analyze the serial correlation of our sample of hedge fund returns and unsmooth the series to correct the correlations. Then, we estimate the distribution by the standard Johnson system of laws. Finally, we search for a more general distribution of Johnson type, using a non-parametric approach. Findings – We use data from the indexes Credit Suisse/Tremont Hedge Fund (CSFB/Tremont) provided by Credit Suisse. For the parametric approach, we find that the SU Johnson distribution is the most appropriate, except for the Managed Futures. For the non-parametric approach, we determine the best polynomial approximation of the function characterizing the transformation from the initial Gaussian law to the generalized Johnson distribution. Originality/value of chapter – These findings are novel since we use an extension of the Johnson distributions to better fit non-Gaussian distributions, in particular in the case of hedge fund returns. We illustrate the power of this methodology that can be further developed in the multidimensional case.

  • Conference Article
  • Cite Count Icon 22
  • 10.1061/9780784412763.027
Multivariate Model for Soil Parameters Based on Johnson Distributions
  • Mar 4, 2013
  • Kok-Kwang Phoon + 1 more

The objective of this paper is to demonstrate the practical construction of a multivariate probability distribution function using an actual soil database containing su(CIUC), OCR, and four piezocone parameters. Five hundred and thirty-five multivariate data points were compiled from 40 clay sites around the world (Brazil, Canada, Hong Kong, Italy, Malaysia, Norway, Singapore, Sweden, UK, USA, and Venezuela). It was found that a multivariate probability distribution can be constructed by transforming each component of a multivariate normal distribution to a Johnson distribution. Existing bivariate regression equations focus on strong correlations. Weak correlations are typically discarded. Site investigation is a costly exercise and ideally, one should exploit all measured geotechnical data for design. The multivariate distribution is a concise model to summarize all available information. Conditional distributions can be easily derived to update the marginal distribution of any one parameter or the multivariate distribution of any group of parameters given information from other parameters. One of the objectives of site investigation is to perform cost-effective field tests and to evaluate design parameters based on these field measurements. Clearly, conditioning involving updating one or more design parameters using one or more field measurements is a natural probabilistic generalization of the current soil property evaluation methodology. INTRODUCTION When multivariate geotechnical data exist in sufficient amount, it is of significant practical usefulness to construct a multivariate probability distribution function. The applications include: (a) deriving the mean and coefficient of variation (COV) of any parameter given the information contained in a subset with possibly more than one parameter, and (b) evaluating if new strong pairwise correlations can be found either among the original components or some derived components. For the former, it is likely for the COV of a design parameter, say the undrained shear strength (su), to decrease when other parameters, say normalized cone tip resistance [(qtv)/ v] and overconsolidation (OCR), have been measured. This aspect is significant for reliability-based design. In fact, COV reduction can be viewed as a measure of the value of information and may eventually provide a sensible method for deciding if it is worthwhile to measure an additional parameter. For the latter, the ability to predict the existence of new correlations not included as part of model calibration provides a stronger scientific underpinning to correlation studies in geotechnical engineering. The reason is that these predictions can be falsified by taking new observations, which is the cornerstone of the scientific method. In other words, it is a lot harder to develop multivariate models, but if they do stand the test of time, they are usually more robust than bivariate models. The objective of this paper is to demonstrate the practical construction of a multivariate probability distribution function using an actual soil database containing su(CIUC), OCR, and

  • Research Article
  • Cite Count Icon 5
  • 10.1080/10629360500107642
Evolutionary estimation of parameters of Johnson distributions
  • Mar 1, 2006
  • Journal of Statistical Computation and Simulation
  • Stefan Niermann

The problem of fitting a Johnson distribution to data for situations in which the family membership is not known, a priori is considered in this article. Within each Johnson family member, the maximum likelihood estimator is determined with an evolutionary algorithm and the model with the overall highest likelihood is selected. A simulation experiment is performed to demonstrate the appropriateness of this new method.

  • Conference Article
  • Cite Count Icon 4
  • 10.1145/21850.253108
Fitting Johnson distributions using least squares
  • Jan 1, 1985
  • James J Swain + 1 more

A weighted least squares regression method is proposed for fitting cumulative probability distributions to data. This technique is illustrated for the Johnson translation system of distributions. The least squares procedure minimizes the distance between the vector of uniformized order statistics and its corresponding expected value to identify the Johnson distribution that provides the best fit. This least squares procedure is shown to be numerically robust and to provide a good fit of the data when compared to the empirical distribution. Two examples illustrate the use of the procedure.

  • Research Article
  • Cite Count Icon 1
  • 10.1080/17499518.2025.2582191
Multivariate correlation analysis of mechanical parameters of marine soft soil in Jiangsu, China
  • Nov 5, 2025
  • Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards
  • Caijin Wang + 8 more

The engineering properties of marine soft soil have a significant impact on coastal projects. Accurate prediction of bearing capacity and deformation characteristics of soft soil foundation is the core task of many engineering designs. Based on the conventional laboratory test data of the foundation soils from both the expanded and existing sections of the Lianhuai Expressway, this study investigates the correlations between foundation soil parameters. The Box–Cox transformation and Johnson distribution were employed to normalise the soil parameter data, and multivariate probabilistic prediction models were established for compression modulus, cohesion, and internal friction angle. The results show that with an increasing number of input parameters, the goodness of fit, prediction accuracy, and prediction stability of the model are improved to varying degrees. The study identified the optimal prediction combination for each target parameter, and the model exhibited satisfactory prediction performance. The predictive model proposed in this study, which is based on the uncertainty of soil parameters, has practical value for the reliability design and evaluation of subgrade, slope, and foundation engineering in coastal highway projects.

  • Research Article
  • Cite Count Icon 24
  • 10.2139/ssrn.1706409
The Performance of Johnson Distributions for Computing Value at Risk and Expected Shortfall
  • May 20, 2019
  • SSRN Electronic Journal
  • Jean-Guy Simonato

The Performance of Johnson Distributions for Computing Value at Risk and Expected Shortfall

  • Research Article
  • 10.51419/202156605.
Прогнозирование эксплуатационной надежности электродвигателей предприятий АПК Тюменской области на основе вероятностной модели их технического состояния
  • Dec 25, 2025
  • АгроЭкоИнфо
  • Dmitry Surinsky + 3 more

This article examines the pressing issue of ensuring the operational reliability of asynchronous electric motors (EMs) used in agro-industrial complex (AIC) enterprises in the Tyumen region. Key factors influencing the technical condition of EM insulation are identified: climatic conditions (temperature, humidity), environmental aggressiveness, operating modes, and wear level. An analysis of existing approaches demonstrates that deterministic models do not provide sufficient accuracy in predicting residual service life in a stochastic environment. A probabilistic mathematical model is proposed that takes into account the random nature of the influencing parameters and enables prediction of the remaining service life of EMs with a given failure probability. The model is based on universal Pearson and Johnson distribution families, as well as singular spectrum analysis (SSA) for time series processing. The model was experimentally validated on a sample of 214 motors with power ratings ranging from 2.2 to 7.5 kW. The average relative prediction error was 8%, confirming the model's validity. A methodology for determining residual service life and corresponding software have been developed. The cost effectiveness of implementing this methodology has been calculated: per electric motor, it amounts to 3,049.45 rubles, or 17.82% compared to traditional maintenance methods. Keywords: ELECTRIC MOTOR, OPERATIONAL RELIABILITY, RESIDUAL LIFE, PROBABILISTIC MODEL, INSULATION, APC, FORECASTING, SINGULAR SPECTRUM ANALYSIS, PEARSON AND JOHNSON DISTRIBUTIONS, GOST

  • Research Article
  • Cite Count Icon 1
  • 10.5139/jksas.2015.43.12.1054
LADGNSS 항법지원을 받는 무인항공기의 비행 기술 오차 모델링 기법
  • Dec 1, 2015
  • Journal of the Korean Society for Aeronautical & Space Sciences
  • Kiwan Kim + 3 more

민수용 무인항공기의 활용이 확대될 것으로 기대되면서 무인항공기의 항법 정확도와 항법 무결성의 보장에 대한 문제가 중요해지고 있다. 최근 민수용 무인항공기를 대상으로 항법 정확도와 항법 무결성을 보장하는 지역보강항법시스템(Local-Area Differential Global Navigation Satellite System, LADGNSS)의 개념이 제시된 바 있다. LADGNSS는 무인항공기간의 충돌을 방지하기 위한 최소분리거리 정보를 제공하여 무인항공기의 안전을 보장한다. 최소분리거리를 산출하기 위해서는 무인항공기의 비행기술오차(Flight Technical Error)에 대한 정보가 필요한데, 이 오차는 기존 유인항공기 분야에서 평균이 0인 정규분포로 모델링 되어 왔다. 하지만 무인항공기의 경우 유인항공기와 다르게 제어/항법장비나 비행경로 등에 대한 표준이 다변화 될 것으로 예상되며 비행기술오차에 대해서 일괄적으로 평균이 0인 정규분포를 가정하는 것은 무결성 정보 산출 시 과도한 보수성을 야기할 수 있다. 본 연구에서는 비행실험을 통해 무인항공기의 비행기술오차를 수집하고, 해당 오차의 특성을 잘 묘사할 수 있는 Johnson 분포 모델을 이용해 오차를 모델링 하였다. 오차모델에 대한 적합성을 평가하기 위해서 Kolmogorov-Smirnov Test와 Anderson-Darling Test를 수행하였다. Navigation accuracy, integrity, and safety of commercial Unmanned Aerial Vehicle (UAV) is becoming crucial as utilization of UAV in commercial applications is expected to increase. Recently, the concept of Local-Area Differential GNSS (LADGNSS) which can provide navigation accuracy and integrity of UAV was proposed. LADGNSS can provide differential corrections and separation distances for precise and safe operation of the UAV. In order to derive separation distances between UAVs, modeling of Flight Technical Error (FTE) is required. In most cases, FTE for civil aircraft has been assumed to be zero-mean normal distribution. However, this assumption can cause overconservatism especially for UAV, because UAV may use control and navigation equipments in wider performance range and follow more diverse path than standard airway for civil aircraft. In this research, flight experiments were carried out to understand the characteristics of FTE distribution. Also, this paper proposes to use Johnson distribution which can better describe heavy-tailed and skewed FTE data. Futhermore, Kolmogorov-Smirnov and Anderson-Darling tests were conducted to evaluate the goodness of fit of Johnson model.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.6000/1929-6029.2015.04.02.8
Control Charts for Skewed Distributions: Johnson’s Distributions
  • May 21, 2015
  • International Journal of Statistics in Medical Research
  • Bachioua Lahcene

In this study, some important issues regarding process capability and performance have been highlighted, particularly in case when the distribution of a process characteristic is non-normal. The process capability and performance analysis has become an inevitable step in quality management of modern industrial processes. Determination of the performance capability of a stable process using the standard process capability indices (Cp, Cpk) requires that the quality characteristics of the underlying process data should follow a normal distribution. Statistical Process Control charts widely used in industry and services by quality professionals require that the quality characteristic being monitored is normally distributed. If, in contrast, the distribution of this characteristic is not normal, any conclusion drawn from control charts on the stability of the process may be misleading and erroneous. In this paper, an alternative approach has been suggested that is based on the identification of the best distribution that would fit the data. Specifically, the Johnson distribution was used as a model to normalize real field data that showed departure from normality. Real field data from the construction industry was used as a case study to illustrate the proposed analysis.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/axioms13110749
Maximum Penalized Likelihood Estimation of the Skew–t Link Model for Binomial Response Data
  • Oct 30, 2024
  • Axioms
  • Omar Chocotea-Poca + 2 more

A critical aspect of modeling binomial response data is selecting an appropriate link function, as an improper choice can significantly affect model precision. This paper introduces the skew–t link model, an extension of the skew–probit model, offering increased flexibility by incorporating both asymmetry and heavy tails, making it suitable for asymmetric and complex data structures. A penalized likelihood-based estimation method is proposed to stabilize parameter estimation, particularly for the asymmetry parameter. Extensive simulation studies demonstrate the model’s superior performance in terms of lower bias, root mean squared error (RMSE), and robustness compared to traditional symmetric models like probit and logit. Furthermore, the model is applied to two real-world datasets: one concerning women’s labor participation and another related to cardiovascular disease outcomes, both showing superior fitting capabilities compared to more traditional models (with probit and the skew–probit links). These findings highlight the model’s applicability to socioeconomic and medical research, characterized by skew and asymmetric data. Moreover, the proposed model could be applied in various domains where data exhibit asymmetry and complex structures.

  • Research Article
  • Cite Count Icon 16
  • 10.2136/sssaj2014.09.0384
Relationship between 1:5 Soil/Water and Saturated Paste Extract Sodium Adsorption Ratios by Three Extraction Methods
  • Feb 12, 2015
  • Soil Science Society of America Journal
  • Yangbo He + 4 more

Cations extracted from soil using non-standard techniques are used to calculate the sodium adsorption ratio (SAR). To interpret these values, analytical approaches for converting alternative approaches to standard approaches are needed. The objective of this research was to develop the relationship between the standard approach (saturated paste extract, SARe) and alternative approaches where the cations are in 1: 5 soil/water ratios and are mixed by shaking, stirring, or an NRCS method (allowed to reach equilibrium). One hundred soils sampled from glacial parent materials in North Dakota were selected for this study. The SAR values from the three methods were highly correlated to each other. Simple linear regression (Model 1), robust regression, and Model 2 were analyzed for the relationship between SAR1:5 and SARe. Outlier analysis and calcite content distribution indicated the possible influence of calcite content. The soil data were classified into high and low calcite by a 4.2% criterion. In addition, Ca in the 1:5 and saturated paste extracts showed poor relationships, also indicating that Ca had an influence. Model 1 and robust regression were similar in expressing the relationship between soil SAR1:5 and SARe with a normal residual distribution, but Model 2 had high left skewness in the residual distribution. The model prediction was increased when soil data were classified by calcite content. Predicting the SARe of soils from SAR1:5 data is possible but soil-calcite concentrations should be considered; 1:5 methods can confidently be compared.

  • Research Article
  • Cite Count Icon 8
  • 10.13023/ktc.rr.2005.03
Kentucky Geotechnical Database
  • Jun 29, 2013
  • UKnowledge (University of Kentucky)
  • Tommy C Hopkins + 3 more

Development of a comprehensive, dynamic, geotechnical database is described. Computer software selected to program the client/server application in Windows' environment, components and structure of the geotechnical database, and primary factors considered in constructing the database are discussed. Oracle (registered trademark) 8i, PowerBuilder (registered trademark) 8, and Map Object (registered trademark) software were used to construct the database, build graphical user interfaces, and embed roadway maps, respectively. Any number of users may use the database simultaneously. Twelve highway district offices and several central offices of the Kentucky Transportation Cabinet are connected to the database. Data may be entered and retrieved dynamically in the client/server structure. This report is the fourth of four, recently completed, research studies. It summarizes all studies and describes the integration of major components of the database. Components include rock slope, landslide, and soil and rock engineering data. The first two studies, conducted in the mid 1990s, focused on potential rock slope hazards and the development of a rock slope management system. The third research study and report, which was published in 2003, focused on landslides. The focus of the fourth, and current, study is on soil and rock engineering data generated during geotechnical investigations and testing. This report deals more with developing specific database features, simplifying data entry schemes, and expanding retrieval capabilities and flexibilities. A large amount of additional soil and rock geotechnical engineering data was entered during the current study. Information in this report is presented in three parts: rock slopes, landslides, and soil and rock engineering data, which reflects the historical accumulation of these components under separate studies. Several schemes for retrieving data and generating reports are described. Secondary components of the database include statistical analyzers and engineering applications for performing on-line analysis of data, developing correlations between different soil parameters, and performing engineering analysis and designs. Procedures for entering historical soil and rock engineering data have been developed and programmed. Issues concerning database security, engineering units, and storing and displaying maps, graphics, and photographs are discussed. The database contains procedures for dynamically overlaying the locations of landslides, rock slopes, and borings onto embedded roadway and digitized geological maps. Strategies and illustrations of graphical user interfaces for data entry and retrieval are described.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant