Articles published on Multivariate normal distribution
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
3230 Search results
Sort by Recency
- New
- Research Article
- 10.3758/s13428-025-02909-7
- Dec 26, 2025
- Behavior Research Methods
- Anja F Ernst + 1 more
Time-series data have become ubiquitous in psychological research, allowing us to study detailed within-person dynamics and their heterogeneity across persons. Vector autoregressive (VAR) models have become a popular choice as a first approximation of these dynamics. The VAR model for each person and heterogeneity across persons can be jointly modeled using a hierarchical model that treats heterogeneity as a latent distribution. Currently, the most popular choice for this is the multilevel VAR model, which models heterogeneity across persons as quantitative variation through a multivariate Gaussian distribution. Here, we discuss an alternative, the latent class VAR model, which models heterogeneity as qualitative variation using a number of discrete clusters. While this model has been introduced before, it has not been readily accessible to researchers. Here we address this issue by providing an accessible introduction to latent class VAR models; a simulation evaluating how well this model can be estimated in situations resembling applied research; introducing a new R package ClusterVAR, which provides easy-to-use functions to estimate the model; and providing a fully reproducible tutorial on modeling emotion dynamics, which walks the reader through all steps of estimating, analyzing, and interpreting latent class VAR models.
- New
- Research Article
- 10.1177/17407745251385582
- Dec 26, 2025
- Clinical trials (London, England)
- Shiyu Shu + 3 more
Desirability of outcome ranking (DOOR) is a paradigm for the design, monitoring, analysis, interpretation, and reporting of clinical trials based on patient-centric benefit-risk evaluation, developed to address limitations of existing approaches and advance clinical trial science. The first step in implementing DOOR is defining an ordinal DOOR outcome representing a global patient-centric response, a cumulative summary of the benefits and harms for an individual patient. This article aims to develop an analysis methodology for the setting where the DOOR outcome is a progressive time-varying state, and there is interest in event times and times that patients spend in more and less desirable states. We develop methods to estimate and make inferences about the temporal treatment effects. If the k-levels of the DOOR outcome are monotone, then k - 1 non-overlapping Kaplan-Meier survival curves can be estimated and plotted. The areas under the curves asymptotically follow a multivariate Gaussian distribution. We apply restricted mean survival time (RMST) concepts to the ordinal Kaplan-Meier curves and provide steps for estimating the covariance structure. Simulation studies demonstrate that the proposed methods perform well in practical settings. We generate censoring time under a uniform distribution and event times under a multi-state structure. The proposed estimators have small biases, the 95% confidence intervals have correct coverage probabilities, and the proposed tests accurately control the type I error rate under the null hypothesis. We illustrate the methods using data from Adaptive COVID-19 Treatment Trial (ACTT-1), a clinical trial that compared remdesivir vs placebo for the treatment of COVID-19 infection. Ordinal DOOR outcomes, which incorporate benefits and harms and represent an overall patient response, have recently been recommended by the Council for International Organizations of Medical Sciences (CIOMS) as a standard approach to benefit:risk analysis. Such endpoints recognize the cumulative nature of outcomes on patients, account for correlations between efficacy and safety, incorporate multivariate survival outcomes, offer generalizability to inform clinical practice, and recognize finer gradations of patient response and binary outcomes. Robust and interpretable analysis methodologies for ordinal outcomes are needed. Restricted mean survival time is a useful nonparametric approach for robust treatment effect estimation. We provide a framework for inference using multiple RMSTs to analyze DOOR and other ordinal outcomes using an interpretable time metric.
- Research Article
- 10.1088/1361-6501/ae1aab
- Nov 25, 2025
- Measurement Science and Technology
- Yunxiao Wang + 3 more
Abstract For complex and changing environments, mobile sensing robots have been actively implemented by replacing human beings in all-weather environmental monitoring tasks. Mobile sensing robots can conduct autonomous sampling, so as to estimate the environmental spatial field through sampled observations. Active sampling planning is one of the most significant tasks of mobile robots when conducting environmental monitoring. It seeks to determine the critical sampling locations by characterizing the key information of the environmental field, which is generally determined by the objective function based on mathematical statistics. In this paper, a novel active sampling planning method is proposed, which is constructed upon a multivariate Gaussian distribution (MGD) and a global-local uncertainty-driven sampling planner. Specifically, a kernel function set is proposed to estimate the certain type of the MGD, while an uncertainty-driven sampling planner is designed by considering estimation uncertainty globally and locally over the target monitored field. To systematically validate the performance of the proposed method, experiments are carried out by both simulation, real-world data test, and physical field test. In all the experiments, our method achieves the lowest estimation error and uncertainty compared to the traditional methods, which demonstrates the superior performance of the proposed method for robotic active sampling for multivariate Gaussian field estimation.
- Research Article
- 10.1007/s42081-025-00320-2
- Nov 25, 2025
- Japanese Journal of Statistics and Data Science
- Shuhei Muroya + 2 more
Abstract In statistical approaches to materials science, it is often necessary to predict multiple material properties simultaneously based on a limited number of experimental observations. However, due to the cost and difficulty in collecting comprehensive data, datasets often have small sample sizes and frequently contain missing values in both predictor and response variables. These issues are further compounded when the dimensionality of the variables is moderate or high, making the analysis even more challenging. To address these challenges, we propose a novel sparse multivariate regression framework that simultaneously handles missing values and performs model estimation. The joint distribution of predictors and responses is assumed to follow a multivariate normal distribution, and the lasso penalties are applied to both the regression coefficients and the inverse covariance matrix. We demonstrate the effectiveness of our method through numerical experiments on both simulated and real datasets.
- Research Article
- 10.3390/sym17112005
- Nov 19, 2025
- Symmetry
- Yifan Fan + 2 more
Affiliation networks, with their bipartite structure and non-binary features, pose unique challenges due to their complex relationships and diverse node attributes. These challenges differ from those in symmetric one-mode networks. To address them, we propose a generalized logistic affiliation network model. Despite the structural asymmetry, the model incorporates node attributes and includes parameters for actor activeness, event popularity, and symmetric patterns in actor–event interactions. We study the theoretical properties of this model under an asymptotic framework, where the number of actors and events grows to infinity. Using maximum likelihood estimation, we show that the estimators for degree heterogeneity and node homophily converge to multivariate normal distributions under mild conditions. To validate the model and our theory, we conduct experiments on both simulated data and a movie-rating dataset.
- Research Article
- 10.3390/modelling6040144
- Nov 6, 2025
- Modelling
- Guillermo Fernández + 5 more
This work presents a time-domain approach for characterizing the Ground Reaction Forces (GRFs) exerted by a pedestrian during running. It is focused on the vertical component, but the methodology is adaptable to other components or activities. The approach is developed from a statistical perspective. It relies on experimentally measured force-time series obtained from a healthy male pedestrian at eight step frequencies ranging from 130 to 200 steps/min. These data are subsequently used to build a stochastic data-driven model. The model is composed of multivariate normal distributions which represent the step patterns of each foot independently, capturing potential disparities between them. Additional univariate normal distributions represent the step scaling and the aerial phase, the latter with both feet off the ground. A dimensionality reduction procedure is also implemented to retain the essential geometric features of the steps using a sufficient set of random variables. This approach accounts for the intrinsic variability of running gait by assuming normality in the variables, validated through state-of-the-art statistical tests (Henze-Zirkler and Shapiro-Wilk) and the Box-Cox transformation. It enables the generation of virtual GRFs using pseudo-random numbers from the normal distributions. Results demonstrate strong agreement between virtual and experimental data. The virtual time signals reproduce the stochastic behavior, and their frequency content is also captured with deviations below 4.5%, most of them below 2%. This confirms that the method effectively models the inherent stochastic nature of running human gait.
- Research Article
- 10.3390/risks13110220
- Nov 5, 2025
- Risks
- Mostafa S Aminzadeh + 1 more
Modeling loss data is a crucial aspect of actuarial science. In the insurance industry, small claims occur frequently, while large claims are rare. Traditional heavy-tail distributions, such as Weibull, Log-Normal, and Inverse Gaussian distributions, are not suitable for describing insurance data, which often exhibit skewness and fat tails. The literature has explored classical and Bayesian inference methods for the parameters of composite distributions, such as the Exponential–Pareto, Weibull–Pareto, and Inverse Gamma–Pareto distributions. These models effectively separate small to moderate losses from significant losses using a threshold parameter. This research aims to introduce a new composite distribution, the Gamma–Pareto distribution with two parameters, and employ a numerical computational approach to find the maximum likelihood estimates (MLEs) of its parameters. A novel computational approach for a nonlinear regression model where the loss variable is distributed as the Gamma–Pareto and depends on multiple covariates is proposed. The maximum likelihood (ML) and Approximate Bayesian Computation (ABC) methods are used to estimate the regression parameters. The Fisher information matrix, along with a multivariate normal distribution as the prior distribution, is utilized through the ABC method. Simulation studies indicate that the ABC method outperforms the ML method in terms of accuracy.
- Research Article
- 10.1002/cnm.70117
- Nov 1, 2025
- International Journal for Numerical Methods in Biomedical Engineering
- Kazuyoshi Jin + 6 more
ABSTRACTRecently, the concept of a virtual population (Vpop) has attracted attention to provide large‐scale, diverse datasets without compromising individual privacy. The development of the Vpop modelling method for the cerebrovasculature shape is necessary to be established with simple parameter tuning and post‐processing. This study introduces a multivariate normal distribution (MVND) method to generate a Vpop for the cerebrovasculature shape. We defined an MVND by using the position and inner radius, which represent the vascular shape (centerline), as variables. Patient‐specific arteries (basilar artery and internal carotid artery) obtained from MR images were used as a real population (Rpop) to generate an MVND. Then, virtual arteries were sampled from this MVND to generate a Vpop. To evaluate the validity of this method for reproducing shape diversity, we calculated the geometrical features of the centerline in each population. The centerline shows qualitatively similar characteristics between Vpop and Rpop. Geometrical features such as average length calculated from Vpop are in the same range as those of Rpop. Moreover, the distribution of geometrical features exhibits a good degree of fit between Vpop and Rpop. Since MVND considers the correlation among all position and inner radius variables, centerline continuity and anatomical characteristics of cerebrovasculature can be automatically included. Hence, geometric features and their distribution can be reproduced without any parameter tuning. The consistency in geometric parameters between the two populations supports the validity of the MVND method and indicates the potential for generating a Vpop for the cerebrovasculature in a more straightforward and simplified manner.
- Research Article
- 10.1016/j.ejpb.2025.114839
- Nov 1, 2025
- European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V
- Zhengguo Xu + 5 more
Comparison of dissolution profiles: 90 % confidence intervals of different f2 estimators using bootstrap methodology versus the Euclidean Distance of the Non-standardized Expected (EDNE) values.
- Research Article
- 10.1088/1402-4896/adfb07
- Nov 1, 2025
- Physica Scripta
- Takashi Arai
Abstract We propose a multivariate probability distribution that models a linear correlation between continuous and binary variables. The proposed distribution is a natural extension of the previously developed multivariate binary distribution known as the Grassmann distribution. The Grassmann distribution has desirable theoretical properties similar to the multivariate normal distribution, and is parametrized by a $P_0$-matrix that is necessary to ensure the model probabilities to be nonnegative. By using the property of the $P_0$-matrix, we successfully introduce interactions between continuous and binary variables while ensuring that all joint probabilities are nonnegative. We refer to the proposed distribution as canonical in the sense that it is mathematically simple and natural. Using artificial data, we numerically validate the representational capabilities of the proposed model. We further investigate the sampling distribution of the maximum likelihood estimator and empirically observe the consistency of the maximum likelihood estimator. We also construct statistical machine learning methods for classification and clustering using the proposed distribution and demonstrate the usefulness of this distribution.
- Research Article
1
- 10.1080/10618600.2025.2576165
- Oct 17, 2025
- Journal of Computational and Graphical Statistics
- Heeju Lim + 3 more
The Heckman selection model is one of the most well-renowned econometric models in the analysis of data with sample selection. This model is designed to rectify sample selection biases based on the assumption of bivariate normal error terms. However, real data diverge from this assumption in the presence of heavy tails and/or atypical observations. Recently, this assumption has been relaxed via a more flexible Student’s t-distribution, which has appealing statistical properties. This paper introduces a novel Heckman selection model using a bivariate contaminated normal distribution for the error terms. We present an efficient Expectation Conditional Maximization algorithm for parameter estimation with closed-form expressions at the E-step based on truncated multinormal distribution formulas. The point identifiability of the proposed model is also discussed, and its properties have been examined. Through simulation studies, we compare our proposed model with the normal and Student’s t counterparts and investigate the finite-sample properties and the variation in missing rate. Results obtained from two real data analyses showcase the usefulness and effectiveness of our model. The proposed algorithms are implemented in the R package HeckmanEM.
- Research Article
1
- 10.1080/10618600.2025.2551271
- Oct 15, 2025
- Journal of Computational and Graphical Statistics
- Beniamino Hadj-Amar + 3 more
We propose a flexible Bayesian approach for sparse Gaussian graphical modeling of multivariate time series. We account for temporal correlation in the data by assuming that observations are characterized by an underlying and unobserved hidden discrete autoregressive process. We assume multivariate Gaussian emission distributions and capture spatial dependencies by modeling the state-specific precision matrices via graphical horseshoe priors. We characterize the mixing probabilities of the hidden process via a cumulative shrinkage prior that accommodates zero-inflated parameters for non-active components, and further incorporate a sparsity-inducing Dirichlet prior to estimate the effective number of states from the data. For posterior inference, we develop a sampling procedure that allows estimation of the number of discrete autoregressive lags and the number of states, and that cleverly avoids having to deal with the changing dimensions of the parameter space. We thoroughly investigate performance of our proposed methodology through several simulation studies. We further illustrate the use of our approach for the estimation of dynamic brain connectivity based on fMRI data collected on a subject performing a task-based experiment on latent learning. Supplementary materials for this article are available online.
- Research Article
- 10.1007/s10985-025-09673-y
- Oct 9, 2025
- Lifetime data analysis
- Emily M Damone + 2 more
Many methods exist to jointly model either recurrent and related terminal survival events or longitudinal outcome measures and related terminal survival event. However, few methods exist which can account for the dependency between all three outcomes of interest, and none allow for the modeling of all three outcomes without strong correlation assumptions. We propose a joint model which uses subject-specific random effects to connect the survival model (terminal and recurrent events) with a longitudinal outcome model. In the proposed method, proportional hazards models with shared frailties are used to model dependence between the recurrent and terminal events, while a separate (but correlated) set of random effects are utilized in a generalized linear mixed model to model dependence with longitudinal outcome measures. All random effects are related based on an assumed multivariate normal distribution. The proposed joint modeling approach allows for flexible models, particularly for unique longitudinal trajectories, that can be utilized in a wide range of health applications. We evaluate the model through simulation studies as well as through an application to data from the Atherosclerosis Risk in Communities (ARIC) study.
- Research Article
- 10.21869/22231560-2025-29-2-92-108
- Oct 1, 2025
- Proceedings of the Southwest State University
- K D Rusakov
Purpose of research. Development and experimental evaluation of a Bayesian classification algorithm for the person re-identification task using images from multiple surveillance cameras. The study aims to improve identification accuracy through integrating features derived from facial and silhouette images.Methods. The proposed algorithm utilizes a Bayesian classification model based on multivariate normal distributions of features. These features are extracted by neural encoders built on the Vision Transformer architecture and trained using the ArcFace loss function. Integration of modality-specific features is performed by computing logarithmic posterior probabilities of class membership. The effectiveness of the method was evaluated using the open CUHK03 dataset, quantitative analysis via ROC curves, and feature space visualization using the t-SNE method.Results. The algorithm demonstrated high classification performance: precision of 95.65% on CUHK03, up to 97.7% on Market-1501, and 89.2% on MARS. ROC analysis confirmed strong class separability, while t-SNE visualizations showed compact and well-defined clusters. The algorithm is deterministic, robust to noise, and scalable to larger datasets.Conclusion. The developed Bayesian classification algorithm has proven its effectiveness and feasibility for person re-identification tasks in intelligent video surveillance systems. Its advantages include high accuracy, interpretability, and potential for integrating additional features. Future research should focus on incorporating extra attributes and evaluating algorithm performance on significantly larger and more diverse datasets..
- Research Article
- 10.3390/e27090947
- Sep 11, 2025
- Entropy
- Frank Nielsen
The geometric Jensen–Shannon divergence (G-JSD) has gained popularity in machine learning and information sciences thanks to its closed-form expression between Gaussian distributions. In this work, we introduce an alternative definition of the geometric Jensen–Shannon divergence tailored to positive densities which does not normalize geometric mixtures. This novel divergence is termed the extended G-JSD, as it applies to the more general case of positive measures. We explicitly report the gap between the extended G-JSD and the G-JSD when considering probability densities, and show how to express the G-JSD and extended G-JSD using the Jeffreys divergence and the Bhattacharyya distance or Bhattacharyya coefficient. The extended G-JSD is proven to be an f-divergence, which is a separable divergence satisfying information monotonicity and invariance in information geometry. We derive a corresponding closed-form formula for the two types of G-JSDs when considering the case of multivariate Gaussian distributions that is often met in applications. We consider Monte Carlo stochastic estimations and approximations of the two types of G-JSD using the projective -divergences. Although the square root of the JSD yields a metric distance, we show that this is no longer the case for the two types of G-JSD. Finally, we explain how these two types of geometric JSDs can be interpreted as regularizations of the ordinary JSD.
- Research Article
- 10.1016/j.aap.2025.108123
- Sep 1, 2025
- Accident; analysis and prevention
- Heye Huang + 6 more
Understanding driver cognition and decision-making behaviors in high-risk scenarios: A drift diffusion perspective.
- Research Article
- 10.1080/03610926.2025.2541838
- Aug 19, 2025
- Communications in Statistics - Theory and Methods
- Konstantinos Mamis
We prove a formula for the evaluation of expectations containing a scalar function of a Gaussian random vector multiplied by a product of the random vector components, each one raised to a non negative integer power. Some of the powers could be of zeroth order, and, for expectations containing only one vector component to the first power, the formula reduces to Stein’s lemma for the multivariate normal distribution. Furthermore, by setting the function inside the expectation equal to one, we easily re-derive Isserlis’ theorem and its generalizations regarding higher-order moments of a Gaussian random vector. We provide two proofs of the formula, the first being a rigorous proof via mathematical induction. The second proof is a formal, constructive derivation based on treating the expectation not as an integral, but as the consecutive actions of pseudodifferential operators defined via the moment-generating function of the Gaussian random vector.
- Research Article
- 10.1080/19466315.2025.2521110
- Aug 1, 2025
- Statistics in Biopharmaceutical Research
- Rory Samuels + 2 more
We derive a statistical test for the dissolution-profile equivalence of two batches using samples from multivariate normal distributions where the variables correspond to time-points. We refer to this test as the general-t equivalence test. Using Monte Carlo simulation results, we determine that, for many realistic covariance matrices and realistic small-sample-size configurations, our general-t equivalence test for two dissolution profiles is more powerful than the conditional-t equivalence test derived by Saranadasa and Krishnamoorthy and therefore yields better statistically-based dissolution-equivalence decisions. Also, unlike the T2EQ procedure proposed by Hoffelder, the general-t equivalence test maintains the nominal Type I error rate regardless of the data-dimension and sample sizes within each batch. In addition, we derive a general-t equivalence confidence interval for evaluating dissolution-profile equivalence that, for realistic small-sample-size configurations, is generally more precise than the conditional-t equivalence confidence interval derived by Saranadasa and Krishnamoorthy. We demonstrate the efficacy of the general-t equivalence test and confidence interval on two real datasets. An R implementation of the proposed method is offered in the supplementary material.
- Research Article
- 10.1175/jcli-d-24-0591.1
- Aug 1, 2025
- Journal of Climate
- Xu Zhang + 4 more
Abstract Emergent constraints reduce uncertainties in future climate projections by the comparison with current climate and observations. However, previous methods for emergent constraints are limited to variables following normal or multivariate normal distributions. Here, we devise a copula-based emergent constraint (CEC) framework that enables the flexible selection of marginal distribution functions and the combination of multiple constraints. The Markov chain Monte Carlo (MCMC) algorithm is applied to numerically estimate the posterior distribution derived from Bayes’ theorem. This new framework achieves narrower uncertainties in the projections of future global warming than previous approaches that assume normal distributions. Combining two constraints in the Northern and Southern Hemispheres further reduces uncertainties after the integration of different information. Due to the flexibility in distribution functions and constraint size, the CEC framework is applicable to more variables and interactions across various spheres of Earth’s system.
- Research Article
- 10.1002/sim.70189
- Aug 1, 2025
- Statistics in medicine
- Ao Sun + 2 more
In clinical practice, multiple biomarkers are often measured on the same subject for disease diagnosis, and combining them can improve diagnostic accuracy. Existing studies typically combine multiple biomarkers by maximizing the area under the ROC curve (AUC), assuming a gold standard exists or that biomarkers follow a multivariate normal distribution. However, practical diagnostic settings require both optimal combination coefficients and an effective cutoff value, and the reference test may be imperfect. In this article, we propose a two-stage method for identifying the optimal linear combination and cutoff value based on the Youden index. First, it maximizes an approximation of the empirical AUC to estimate the optimal linear coefficients for combining multiple biomarkers. Then, it maximizes the empirical Youden index to determine the optimal cutoff point for disease classification. Under the semiparametric single index model and regularity conditions, the estimators for the linear coefficients, cutoff point, and Youden index are consistent. This method is also applicable when the reference standard is imperfect. We demonstrate the performance of our method through simulations and apply it to construct a diagnostic scale for Chinese medicine.