Latent Factor Regression Research Articles

Regression models, in which the observed features X∈Rp and the response Y∈R depend, jointly, on a lower dimensional, unobserved, latent vector Z∈RK, with K≪p, are popular in a large array of applications, and mainly used for predicting a response from correlated features. In contrast, methodology and theory for inference on the regression coefficient β∈RK relating Y to Z are scarce, since typically the un-observable factor Z is hard to interpret. Furthermore, the determination of the asymptotic variance of an estimator of β is a long-standing problem, with solutions known only in a few particular cases. To address some of these outstanding questions, we develop inferential tools for β in a class of factor regression models in which the observed features are signed mixtures of the latent factors. The model specifications are both practically desirable, in a large array of applications, render interpretability to the components of Z, and are sufficient for parameter identifiability. Without assuming that the number of latent factors K or the structure of the mixture is known in advance, we construct computationally efficient estimators of β, along with estimators of other important model parameters. We benchmark the rate of convergence of β by first establishing its ℓ2-norm minimax lower bound, and show that our proposed estimator βˆ is minimax-rate adaptive. Our main contribution is the provision of a unified analysis of the component-wise Gaussian asymptotic distribution of βˆ and, especially, the derivation of a closed form expression of its asymptotic variance, together with consistent variance estimators. The resulting inferential tools can be used when both K and p are independent of the sample size n, and also when both, or either, p and K vary with n, while allowing for p>n. This complements the only asymptotic normality results obtained for a particular case of the model under consideration, in the regime K=O(1) and p→∞, but without a variance estimate. As an application, we provide, within our model specifications, a statistical platform for inference in regression on latent cluster centers, thereby increasing the scope of our theoretical results. We benchmark the newly developed methodology on a recently collected data set for the study of the effectiveness of a new SIV vaccine. Our analysis enables the determination of the top latent antibody-centric mechanisms associated with the vaccine response.

Read full abstract

Multi-item surveys are frequently used to study scores on latent factors, like human values, attitudes, and behavior. Such studies often include a comparison, between specific groups of individuals or residents of different countries, either at one or multiple points in time (i.e., a cross-sectional or a longitudinal comparison or both). If latent factor means are to be meaningfully compared, the measurement structures of the latent factor and their survey items should be stable, that is “invariant.” As proposed by Mellenbergh (1989), “measurement invariance” (MI) requires that the association between the items (or test scores) and the latent factors (or latent traits) of individuals should not depend on group membership or measurement occasion (i.e., time). In other words, if item scores are (approximately) multivariate normally distributed, conditional on the latent factor scores, the expected values, the covariances between items, and the unexplained variance unrelated to the latent factors should be equal across groups. Many studies examining MI of survey scales have shown that the MI assumption is very hard to meet. In particular, strict forms of MI rarely hold. With “strict” we refer to a situation in which measurement parameters are exactly the same across groups or measurement occasions, that is an enforcement of zero tolerance with respect to deviations between groups or measurement occasions. Often, researchers just ignore MI issues and compare latent factor means across groups or measurement occasions even though the psychometric basis for such a practice does not hold. However, when a strict form of MI is not established and one must conclude that respondents attach different meanings to survey items, this makes it impossible to make valid comparisons between latent factor means. As such, the potential bias caused by measurement non-invariance obstructs the comparison of latent factor means (if strict MI does not hold) or regression coefficients (if less strict forms of MI do not hold). Traditionally, MI is tested for in a multiple group confirmatory factor analysis (MGCFA) with groups defined by unordered categorical (i.e., nominal) between-subject variables. In MGCFA, MI is tested at each constraint of the latent factor model using a series of nested (latent) factor models. This traditional way of testing for MI originated with Joreskog (1971), who was the first scholar to thoroughly discuss the invariance of latent factor (or measurement) structures. Additionally, Sorbom (1974, 1978) pioneered the specification and estimation of latent factor means using a multi-group SEM approach in LISREL (Joreskog and Sorbom, 1996). Following these contributions the multi-group specification of latent factor structures has become widespread in all major SEM software programs (e.g., AMOS Arbuckle, 2006, EQS Bender and Wu, 1995, LAVAAN Rosseel, 2012, Mplus Muthen and Muthen, 2013, STATA STATA, 2015, and OpenMx Boker et al., 2011). Shortly thereafter, Byrne et al. (1989) introduced the distinction between full and partial MI. Although their introduction was of great value, the first formal treatment of different forms of MI and their consequences for the validity of multi-group/multi-time comparisons is attributable to Meredith (1993). So far, a tremendous amount of papers dealing with MI have been published. The literature on MI published in the 20th century is nicely summarized by Vandenberg and Lance (2000). Noteworthy is also the overview of applications in cross-cultural studies provided by Davidov et al. (2014), as well as a recent book by Millsap (2011) containing a general systematic treatment of the topic of MI. The traditional MGCFA approach to MI-testing is described by, for example, Byrne (2004), Chen et al. (2005), Gregorich (2006), van de Schoot et al. (2012), Vandenberg (2002) and Wicherts and Dolan (2010). Researchers entering the field of MI are recommended to first consult Meredith (1993) and Millsap (2011) before reading other valuable academic works. Recent developments in statistics have provided new analytical tools for assessing MI. The aim of this special issue is to provide a forum for a discussion of MI, covering some crucial “themes”: (1) ways to assess and deal with measurement non-invariance; (2) Bayesian and IRT methods employing the concept of approximate MI; and (3) new or adjusted approaches for testing MI to fit increasingly complex statistical models and specific characteristics of survey data.

Read full abstract

Latent Factor Regression Research Articles

Articles published on Latent Factor Regression

An integrative latent class model of heterogeneous data modalities for diagnosing kidney obstruction.

High-Dimensional Mediation Analysis: A New Method Applied to Maternal Smoking, Placental DNA Methylation, and Birth Outcomes.

Are Latent Factor Regression and Sparse Regression Adequate?

A two-stage latent factor regression method to model the common and unique effects of multiple highly correlated exposure variables

HDMAX2: A framework for High Dimensional Mediation Analysis with application to maternal smoking, placental DNA methylation and birth outcomes

High Dimensional Multiomics Reveals Unique Characteristics of Early Plasma Administration in Polytrauma Patients With TBI.

Inference in latent factor regression with clusterable features

Sparse latent factor regression models for genome-wide and epigenome-wide association studies.

Relationship between high-risk behavior and mental health problems using latent factor regression for grouped outcomes model

Genetic Parameters and Genotype-by-Environment Interactions in Regional Progeny Tests of Pinus taeda L. in the Southern USA

Heterogeneous Large Datasets Integration Using Bayesian Factor Regression

A Bayesian Latent Class Model to Predict Kidney Obstruction in the Absence of Gold Standard

Bayesian latent factor regression for multivariate functional data with variable selection

Hierarchical infinite factor models for improving the prediction of surgical complications for geriatric patients

Corrigendum to "Association of Stressful Life Events with Psychological Problems: A Large-Scale Community-Based Study Using Grouped Outcomes Latent Factor Regression with Latent Predictors".

Primary Factors Statistically Associated with Diarrheal Occurrences

Spatial Bayesian latent factor regression modeling of coordinate-based meta-analysis data.

Association of Stressful Life Events with Psychological Problems: A Large-Scale Community-Based Study Using Grouped Outcomes Latent Factor Regression with Latent Predictors.

Generalized reduced rank latent factor regression for high dimensional tensor fields, and neuroimaging-genetic applications

Editorial: Measurement Invariance.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Latent Factor Regression Research Articles

Articles published on Latent Factor Regression

An integrative latent class model of heterogeneous data modalities for diagnosing kidney obstruction.

High-Dimensional Mediation Analysis: A New Method Applied to Maternal Smoking, Placental DNA Methylation, and Birth Outcomes.

Are Latent Factor Regression and Sparse Regression Adequate?

A two-stage latent factor regression method to model the common and unique effects of multiple highly correlated exposure variables

HDMAX2: A framework for High Dimensional Mediation Analysis with application to maternal smoking, placental DNA methylation and birth outcomes

High Dimensional Multiomics Reveals Unique Characteristics of Early Plasma Administration in Polytrauma Patients With TBI.

Inference in latent factor regression with clusterable features

Sparse latent factor regression models for genome-wide and epigenome-wide association studies.

Relationship between high-risk behavior and mental health problems using latent factor regression for grouped outcomes model

Genetic Parameters and Genotype-by-Environment Interactions in Regional Progeny Tests of Pinus taeda L. in the Southern USA

Heterogeneous Large Datasets Integration Using Bayesian Factor Regression

A Bayesian Latent Class Model to Predict Kidney Obstruction in the Absence of Gold Standard

Bayesian latent factor regression for multivariate functional data with variable selection

Hierarchical infinite factor models for improving the prediction of surgical complications for geriatric patients

Corrigendum to "Association of Stressful Life Events with Psychological Problems: A Large-Scale Community-Based Study Using Grouped Outcomes Latent Factor Regression with Latent Predictors".

Primary Factors Statistically Associated with Diarrheal Occurrences

Spatial Bayesian latent factor regression modeling of coordinate-based meta-analysis data.

Association of Stressful Life Events with Psychological Problems: A Large-Scale Community-Based Study Using Grouped Outcomes Latent Factor Regression with Latent Predictors.

Generalized reduced rank latent factor regression for high dimensional tensor fields, and neuroimaging-genetic applications

Editorial: Measurement Invariance.