Factor augmented CUB model for multivariate ordinal data
Factor augmented CUB model for multivariate ordinal data
- Research Article
1
- 10.11648/j.ajtas.20231201.11
- Mar 28, 2023
- American Journal of Theoretical and Applied Statistics
Multivariate longitudinal ordinal data are often involved in longitudinal studies with each individual having more than one longitudinal ordinal measure. However, due to complicated correlation structures within each individual and no explicit likelihood functions, analyzing multivariate longitudinal ordinal data is quite challenging. In this paper, Markov chain Monte Carlo (MCMC) sampling methods are developed to analyze multivariate longitudinal ordinal data by extending multivariate probit (MVP) models for univariate longitudinal ordinal data to multiple multivariate probit models (MMVP) for multivariate longitudinal ordinal data. The identifiable MVP models require the covariance matrix of the latent multivariate normal variables underlying the longitudinal ordinal variables to be a correlation matrix, thus a Metropolis-Hastings (MH) algorithm is usually necessitated, which brings a rigorous task to develop efficient MCMC sampling methods. In contrast to the identifiable MVP models, the non-identifiable MVP models can be constructed to circumvent a MH algorithm to sample a correlation matrix by a Gibbs sampling to sample a covariance matrix, and hence improve the mixing and convergence of the MCMC components. Therefore, both the identifiable MMVP models and the non-identifiable MMVP models for multivariate longitudinal ordinal data are presented, and their corresponding MCMC sampling methods are developed. The performances of these methods are illustrated through simulation studies and an application using data from the Russia Longitudinal Monitoring Survey-Higher School of Economics (RLMS-HSE).
- Research Article
35
- 10.1007/s13253-013-0136-z
- Apr 11, 2013
- Journal of Agricultural, Biological, and Environmental Statistics
We propose a Bayesian model for mixed ordinal and continuous multivariate data to evaluate a latent spatial Gaussian process. Our proposed model can be used in many contexts where mixed continuous and discrete multivariate responses are observed in an effort to quantify an unobservable continuous measurement. In our example, the latent, or unobservable measurement is wetland condition. While predicted values of the latent wetland condition variable produced by the model at each location do not hold any intrinsic value, the relative magnitudes of the wetland condition values are of interest. In addition, by including point-referenced covariates in the model, we are able to make predictions at new locations for both the latent random variable and the multivariate response. Lastly, the model produces ranks of the multivariate responses in relation to the unobserved latent random field. This is an important result as it allows us to determine which response variables are most closely correlated with the latent variable. Our approach offers an alternative to traditional indices based on best professional judgment that are frequently used in ecology. We apply our model to assess wetland condition in the North Platte and Rio Grande River Basins in Colorado. The model facilitates a comparison of wetland condition at multiple locations and ranks the importance of in-field measurements.
- Preprint Article
1
- 10.1007/s13253-013-0136
- Mar 23, 2013
We propose a Bayesian model for mixed ordinal and continuous multivariate data to evaluate a latent spatial Gaussian process. Our proposed model can be used in many contexts where mixed continuous and discrete multivariate responses are observed in an effort to quantify an unobservable continuous measurement. In our example, the latent, or unobservable measurement is wetland condition. While predicted values of the latent wetland condition variable produced by the model at each location do not hold any intrinsic value, the relative magnitudes of the wetland condition values are of interest. In addition, by including point-referenced covariates in the model, we are able to make predictions at new locations for both the latent random variable and the multivariate response. Lastly, the model produces ranks of the multivariate responses in relation to the unobserved latent random field. This is an important result as it allows us to determine which response variables are most closely correlated with the latent variable. Our approach offers an alternative to traditional indices based on best professional judgment that are frequently used in ecology. We apply our model to assess wetland condition in the North Platte and Rio Grande River Basins in Colorado. The model facilitates a comparison of wetland condition at multiple locations and ranks the importance of in-field measurements.
- Research Article
43
- 10.2307/2532888
- Jun 1, 1996
- Biometrics
The mixed effects model for binary responses due to Conaway (1990, A Random Effects Model for Binary Data) is extended to accommodate ordinal responses in general and discrete time survival data with ordinal responses in particular. Given a multinomial likelihood, cumulative complementary log-log link function, and log-gamma random effects distribution, the resulting marginal likelihood has a closed form. As a result, a Newton-Raphson estimation procedure is feasible without resorting to numerical, approximation-based, or Monte Carlo integration techniques. The parameters in the model have a proportional hazards interpretation in terms of multivariate discrete time data with ordinal responses. Using data from a psychological example, the proposed method is compared with other mixed effects approaches as well as population-averaged models.
- Research Article
4
- 10.1088/1742-6596/1751/1/012014
- Jan 1, 2021
- Journal of Physics: Conference Series
The Vector Autoregressive (VAR) model is a statistical model that can be used for modeling multivariate time series data which is commonly applied in the fields of finance, management, business and economics. However, economic data, especially return values, have quite high data fluctuations, so we need to add the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model in the analysis to obtain efficient results. This study will discuss the formation of the best model for multivariate time series data, namely return data of PT. Indofarma Tbk. (INAF) and PT. Kimia Farma Tbk. (KAEF) from June 2015 to July 2020, where data retuned for the two variables tended to have a high volatility shock at some time and low volatility at other times which characterizes the data as having an ARCH effect so that the GARCH model will be used in this analysis, namely the BEKK-model. GARCH. This model proposes a new parameterization which is easily given a restriction, namely the requirement that H_t must be positive for all values of ε_t and x_t in sample room. Based on the selection of the best model using the AICC, HQC, AIC and SBC criteria, it is found that the VAR (1)-GARCH (1,1) model is the best model for the data used. Then this research will also examine the behavior and relationship between INAF and KAEF based on Granger Causality and Impulse Response. In addition, based on the forecasting results of the VAR (1)-GARCH (1,1) model, it shows that this model is good for short-term forecasting.
- Research Article
- 10.1002/sim.70442
- Feb 1, 2026
- Statistics in medicine
We develop an integrative joint model for multivariate sparse functional and survival data to analyze Alzheimer's disease (AD) across multiple studies. To address missing-by-design outcomes in multi-cohort studies, our approach extends the multivariate functional mixed model (MFMM), which integrates longitudinal outcomes to extract shared disease progression trajectories and links these outcomes to time-to-event data through a parsimonious survival model. This framework balances flexibility and interpretability by modeling shared progression trajectories while accommodating cohort-specific mean functions and survival parameters. For efficient estimation, we incorporate penalized splines into an EM algorithm. Application to three AD cohorts demonstrates the model's ability to capture disease trajectories and account for inter-cohort variability. Simulation studies confirm its robustness and accuracy, highlighting its value in advancing the understanding of AD progression and supporting clinical decision-making in multi-cohort settings.
- Research Article
37
- 10.1111/j.0006-341x.2004.00254.x
- Dec 1, 2004
- Biometrics
We consider measurement error in covariates in the marginal hazards model for multivariate failure time data. We explore the bias implications of normal additive measurement error without assuming a distribution for the underlying true covariate. To correct measurement-error-induced bias in the regression coefficient of the marginal model, we propose to apply the SIMEX procedure and demonstrate its large and small sample properties for both known and estimated measurement error variance. We illustrate this method using the Lipid Research Clinics Coronary Primary Prevention Trial data with total cholesterol as the covariate measured with error and time until angina and time until nonfatal myocardial infarction as the correlated outcomes of interest.
- Research Article
6
- 10.1080/00949655.2015.1007983
- Feb 6, 2015
- Journal of Statistical Computation and Simulation
In this paper we propose a new lifetime model for multivariate survival data with a surviving fraction. We develop this model assuming that there are m types of unobservable competing risks, where each risk is related to a time of the occurrence of an event of interest. We explore the use of Markov chain Monte Carlo methods to develop a Bayesian analysis for the proposed model. We also perform a simulation study in order to analyse the frequentist coverage probabilities of credible interval derived from posteriors. Our modelling is illustrated through a real data set.
- Research Article
97
- 10.1002/sim.1249
- Oct 7, 2002
- Statistics in Medicine
A new model for multivariate non-normal longitudinal data is proposed. In a first step, each longitudinal series of data corresponding to a given response is modelled separately using a copula to relate the marginal distributions of the response at each time of observation. In a second step, at each observation time, the conditional (on the past) distributions of each response are related using another copula describing the relationship between the corresponding variables. Note that there is no need to consider the same family of distributions for these response variables. The technique is illustrated in a dose titration safety study on a new antidepressant. The haemodynamic effect on diastolic blood pressure, systolic blood pressure and heart rate is studied. These three responses are measured repeatedly over time on ten healthy volunteers during the dose escalation. The available covariates are sex and the concentration of drug in the plasma at time of measurement.
- Book Chapter
5
- 10.1007/978-3-319-42972-4_46
- Jul 30, 2016
The aim of the work is to propose a new flexible way of modeling the dependence between the components of non-normal multivariate longitudinal-data by using the copula approach. The presence of longitudinal data is increasing in the scientific areas where several variables are measured over a sample of statistical units at different times, showing two types of dependence: between variables and across time. We propose to model jointly the dependence structure between the responses and the temporal structure of each processes by pair copula contruction (PCC). The use of the copula allows the relaxation of the assumption of multinormality that is typical of the usual model for multivariate longitudinal data. The use of PCC allows us to overcome the problem of the multivariate copulae used in the literature which suffer from rather inflexible structures in high dimension. The result is a new extremly flexible model for multivariate longitudinal data, which overcomes the problem of modeling simultaneous dependence between two or more non-normal responses over time. The explanation of the methodology is accompanied by an example.
- Research Article
18
- 10.1016/j.csda.2009.03.024
- Apr 5, 2009
- Computational Statistics & Data Analysis
Bayesian model checking for multivariate outcome data
- Research Article
4
- 10.1007/s10651-016-0360-0
- Nov 1, 2016
- Environmental and Ecological Statistics
In many environmental and ecological studies, it is of interest to model compositional data. One approach is to consider positive random vectors that are subject to a unit-sum constraint. In landscape ecological studies, it is common that compositional data are also sampled in space with some elements of the composition absent at certain sampling sites. In this paper, we first propose a practical spatial multivariate ordered probit model for multivariate ordinal data, where the response variables can be viewed as the discretized non-negative compositions without the unit-sum constraint. We then propose a novel two-stage spatial mixture Dirichlet regression model. The first stage models the spatial dependence and the presence of exact zero values, and the second stage models all the non-zero compositional data. A maximum composite likelihood approach is developed for parameter estimation and inference in both the spatial multivariate ordered probit model and the two-stage spatial mixture Dirichlet regression model. The standard errors of the parameter estimates are computed by an estimate of the Godambe information matrix. A simulation study is conducted to evaluate the performance of the proposed models and methods. A land cover data example in landscape ecology further illustrates that accounting for spatial dependence can improve the accuracy in the prediction of presence/absence of different land covers as well as the magnitude of land cover compositions.
- Research Article
- 10.1177/09622802251412838
- Mar 1, 2026
- Statistical methods in medical research
In the context of longitudinal data regression modeling, individuals often have two or more response indicators, and these response indicators are typically correlated to some extent. Additionally, in the field of clinical medicine, the response indicators of longitudinal data are often ordinal. For the joint modeling of multivariate ordinal longitudinal data, methods based on mean regression (MR) are commonly used to study latent variables. However, for data with non-normal errors, MR methods often perform poorly. As an alternative to MR methods, composite quantile regression (CQR) can overcome the limitations of MR methods and provide more robust estimates. This article proposes a joint relative composite quantile regression method (joint relative CQR) for multivariate ordinal longitudinal data and investigates its application to a set of longitudinal medical datasets on dementia. Firstly, the joint relative CQR method for multivariate ordinal longitudinal data is constructed based on the pseudo composite asymmetric Laplace distribution (PCALD) and latent variable models. Secondly, the parameter estimation problem of the model is studied using MCMC algorithms. Finally, Monte Carlo simulations and a set of longitudinal medical datasets on dementia validate the effectiveness of the proposed model and method.
- Research Article
1
- 10.22237/jmasm/1099268460
- Nov 1, 2004
- Journal of Modern Applied Statistical Methods
A modification of the Andersen-Gill gamma shared frailty model is presented. The variance of the frailty is directly modeled by means of a generalized linear model, the EM algorithm is modified in order to simultaneously estimate a semiparametric model for the failure times and a model for the variance of the frailty. A simulation study is conducted to evaluate the performance of the proposed algorithm (EMB algorithm) and compared with other methods, a marginal model, and a conditional model. Multivariate data from a nosocomial infection study is used to illustrate the methods. The EMB fit turned out to be better than the fit obtained from a marginal model or from a conditional model. The EMB provided the best fit (being the least over-dispersed and having the highest AIC and the highest pseudo-R square) and estimated the parameters most efficiently. The proposed method is able to capture and to take into account unobservable random effects in semiparametric models.
- Research Article
49
- 10.1016/j.csda.2005.04.005
- May 3, 2005
- Computational Statistics & Data Analysis
Evaluation of transfer evidence for three-level multivariate data with the use of graphical models