Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Export
Sort by: Relevance
  • Open Access Icon
  • Research Article
  • Cite Count Icon 14
  • 10.1093/biostatistics/kxad019
Fast and flexible inference for joint models of multivariate longitudinal and survival data using integrated nested Laplace approximations.
  • Aug 2, 2023
  • Biostatistics
  • Denis Rustand + 4 more

Modeling longitudinal and survival data jointly offers many advantages such as addressing measurement error and missing data in the longitudinal processes, understanding and quantifying the association between the longitudinal markers and the survival events, and predicting the risk of events based on the longitudinal markers. A joint model involves multiple submodels (one for each longitudinal/survival outcome) usually linked together through correlated or shared random effects. Their estimation is computationally expensive (particularly due to a multidimensional integration of the likelihood over the random effects distribution) so that inference methods become rapidly intractable, and restricts applications of joint models to a small number of longitudinal markers and/or random effects. We introduce a Bayesian approximation based on the integrated nested Laplace approximation algorithm implemented in the R package R-INLA to alleviate the computational burden and allow the estimation of multivariate joint models with fewer restrictions. Our simulation studies show that R-INLA substantially reduces the computation time and the variability of the parameter estimates compared with alternative estimation strategies. We further apply the methodology to analyze five longitudinal markers (3 continuous, 1 count, 1 binary, and 16 random effects) and competing risks of death and transplantation in a clinical trial on primary biliary cholangitis. R-INLA provides a fast and reliable inference technique for applying joint models to the complex multivariate data encountered in health research.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 10
  • 10.1093/biostatistics/kxad006
Systematically missing data in causally interpretable meta-analysis.
  • Mar 28, 2023
  • Biostatistics
  • Jon A Steingrimsson + 3 more

Causally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but from which covariate information can be obtained. In such analyses, a key practical challenge is the presence of systematically missing data when some trials have collected data on one or more baseline covariates, but other trials have not, such that the covariate information is missing for all participants in the latter. In this article, we provide identification results for potential (counterfactual) outcome means and average treatment effects in the target population when covariate data are systematically missing from some of the trials in the meta-analysis. We propose three estimators for the average treatment effect in the target population, examine their asymptotic properties, and show that they have good finite-sample performance in simulation studies. We use the estimators to analyze data from two large lung cancer screening trials and target population data from the National Health and Nutrition Examination Survey (NHANES). To accommodate the complex survey design of the NHANES, we modify the methods to incorporate survey sampling weights and allow for clustering.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1093/biostatistics/kxad005
Cohort-based smoothing methods for age-specific contact rates.
  • Mar 20, 2023
  • Biostatistics
  • Yannick Vandendijck + 4 more

The use of social contact rates is widespread in infectious disease modeling since it has been shown that they are key driving forces of important epidemiological parameters. Quantification of contact patterns is crucial to parameterize dynamic transmission models and to provide insights on the (basic) reproduction number. Information on social interactions can be obtained from population-based contact surveys, such as the European Commission project POLYMOD. Estimation of age-specific contact rates from these studies is often done using a piecewise constant approach or bivariate smoothing techniques. For the latter, typically, smoothness is introduced in the dimensions of the respondent's and contact's age (i.e., the rows and columns of the social contact matrix). We propose a smoothing constrained approach-taking into account the reciprocal nature of contacts-introducing smoothness over the diagonal (including all subdiagonals) of the social contact matrix. This modeling approach is justified assuming that when people age their contact behavior changes smoothly. We call this smoothing from a cohort perspective. Two approaches that allow for smoothing over social contact matrix diagonals are proposed, namely (i) reordering of the diagonal components of the contact matrix and (ii) reordering of the penalty matrix ensuring smoothness over the contact matrix diagonals. Parameter estimation is done in the likelihood framework by using constrained penalized iterative reweighted least squares. A simulation study underlines the benefits of cohort-based smoothing. Finally, the proposed methods are illustrated on the Belgian POLYMOD data of 2006. Code to reproduce the results of the article can be downloaded on this GitHub repository https://github.com/oswaldogressani/Cohort_smoothing.

  • Research Article
  • Cite Count Icon 1
  • 10.1093/biostatistics/kxad004
Multi-trait analysis of gene-by-environment interactions in large-scale genetic studies.
  • Mar 10, 2023
  • Biostatistics
  • Lan Luo + 3 more

Identifying genotype-by-environment interaction (GEI) is challenging because the GEI analysis generally has low power. Large-scale consortium-based studies are ultimately needed to achieve adequate power for identifying GEI. We introduce Multi-Trait Analysis of Gene-Environment Interactions (MTAGEI), a powerful, robust, and computationally efficient framework to test gene-environment interactions on multiple traits in large data sets, such as the UK Biobank (UKB). To facilitate the meta-analysis of GEI studies in a consortium, MTAGEI efficiently generates summary statistics of genetic associations for multiple traits under different environmental conditions and integrates the summary statistics for GEI analysis. MTAGEI enhances the power of GEI analysis by aggregating GEI signals across multiple traits and variants that would otherwise be difficult to detect individually. MTAGEI achieves robustness by combining complementary tests under a wide spectrum of genetic architectures. We demonstrate the advantages of MTAGEI over existing single-trait-based GEI tests through extensive simulation studies and the analysis of the whole exome sequencing data from the UKB.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1093/biostatistics/kxad003
A Bayesian approach to estimating COVID-19 incidence and infection fatality rates.
  • Mar 6, 2023
  • Biostatistics
  • Justin J Slater + 5 more

Naive estimates of incidence and infection fatality rates (IFR) of coronavirus disease 2019 suffer from a variety of biases, many of which relate to preferential testing. This has motivated epidemiologists from around the globe to conduct serosurveys that measure the immunity of individuals by testing for the presence of SARS-CoV-2 antibodies in the blood. These quantitative measures (titer values) are then used as a proxy for previous or current infection. However, statistical methods that use this data to its full potential have yet to be developed. Previous researchers have discretized these continuous values, discarding potentially useful information. In this article, we demonstrate how multivariate mixture models can be used in combination with post-stratification to estimate cumulative incidence and IFR in an approximate Bayesian framework without discretization. In doing so, we account for uncertainty from both the estimated number of infections and incomplete deaths data to provide estimates of IFR. This method is demonstrated using data from the Action to Beat Coronavirus erosurvey in Canada.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 8
  • 10.1093/biostatistics/kxac051
DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.
  • Jan 4, 2023
  • Biostatistics
  • Ruoyu He + 5 more

Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.

  • Research Article
  • Cite Count Icon 1
  • 10.1093/biostatistics/kxac046
Spatiotemporal varying coefficient model for respiratory disease mapping in Taiwan.
  • Dec 9, 2022
  • Biostatistics
  • Feifei Wang + 4 more

Respiratory diseases have been global public health problems for a long time. In recent years, air pollutants as important risk factors have drawn lots of attention. In this study, we investigate the influence of $\pm2.5$ (particulate matters in diameter less than 2.5 ${\rm{\mu }} m$) on hospital visit rates for respiratory diseases in Taiwan. To reveal the spatiotemporal pattern of data, we propose a Bayesian disease mapping model with spatially varying coefficients and a parametric temporal trend. Model fitting is conducted using the integrated nested Laplace approximation, which is a widely applied technique for large-scale data sets due to its high computational efficiency. The finite sample performance of the proposed method is studied through a series of simulations. As demonstrated by simulations, the proposed model can improve both the parameter estimation performance and the prediction performance. We apply the proposed model on the respiratory disease data in 328 third-level administrative regions in Taiwan and find significant associations between hospital visit rates and $\pm2.5$.

  • Research Article
  • Cite Count Icon 3
  • 10.1093/biostatistics/kxac044
Time-to-event surrogate endpoint validation using mediation analysis and meta-analytic data.
  • Nov 18, 2022
  • Biostatistics
  • Quentin Le Coënt + 2 more

With the ongoing development of treatments and the resulting increase in survival in oncology, clinical trials based on endpoints such as overall survival may require long follow-up periods to observe sufficient events and ensure adequate statistical power. This increase in follow-up time may compromise the feasibility of the study. The use of surrogate endpoints instead of final endpoints may be attractive for these studies. However, before a surrogate can be used in a clinical trial, it must be statistically validated. In this article, we propose an approach to validate surrogates when both the surrogate and final endpoints are censored event times. This approach is developed for meta-analytic data and uses a mediation analysis to decompose the total effect of the treatment on the final endpoint as a direct effect and an indirect effect through the surrogate. The meta-analytic nature of the data is accounted for in a joint model with random effects at the trial level. The proportion of the indirect effect over the total effect of the treatment on the final endpoint can be computed from the parameters of the model and used as a measure of surrogacy. We applied this method to investigate time-to-relapse as a surrogate endpoint for overall survival in resectable gastric cancer.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1093/biostatistics/kxac043
Joint modeling of longitudinal and competing-risk data using cumulative incidence functions for the failure submodels accounting for potential failure cause misclassification through double sampling.
  • Nov 4, 2022
  • Biostatistics
  • Christos Thomadakis + 3 more

Most of the literature on joint modeling of longitudinal and competing-risk data is based on cause-specific hazards, although modeling of the cumulative incidence function (CIF) is an easier and more direct approach to evaluate the prognosis of an event. We propose a flexible class of shared parameter models to jointly model a normally distributed marker over time and multiple causes of failure using CIFs for the survival submodels, with CIFs depending on the "true" marker value over time (i.e., removing the measurement error). The generalized odds rate transformation is applied, thus a proportional subdistribution hazards model is a special case. The requirement that the all-cause CIF should be bounded by 1 is formally considered. The proposed models are extended to account for potential failure cause misclassification, where the true failure causes are available in a small random sample of individuals. We also provide a multistate representation of the whole population by defining mutually exclusive states based on the marker values and the competing risks. Based solely on the assumed joint model, we derive fully Bayesian posterior samples for state occupation and transition probabilities. The proposed approach is evaluated in a simulation study and, as an illustration, it is fitted to real data from people with HIV.

  • Research Article
  • Cite Count Icon 3
  • 10.1093/biostatistics/kxac039
An online framework for survival analysis: reframing Cox proportional hazards model for large data sets and neural networks.
  • Oct 26, 2022
  • Biostatistics
  • Aliasghar Tarkhan + 1 more

In many biomedical applications, outcome is measured as a "time-to-event" (e.g., disease progression or death). To assess the connection between features of a patient and this outcome, it is common to assume a proportional hazards model and fit a proportional hazards regression (or Cox regression). To fit this model, a log-concave objective function known as the "partial likelihood" is maximized. For moderate-sized data sets, an efficient Newton-Raphson algorithm that leverages the structure of the objective function can be employed. However, in large data sets this approach has two issues: (i) The computational tricks that leverage structure can also lead to computational instability; (ii) The objective function does not naturally decouple: Thus, if the data set does not fit in memory, the model can be computationally expensive to fit. This additionally means that the objective is not directly amenable to stochastic gradient-based optimization methods. To overcome these issues, we propose a simple, new framing of proportional hazards regression: This results in an objective function that is amenable to stochastic gradient descent. We show that this simple modification allows us to efficiently fit survival models with very large data sets. This also facilitates training complex, for example, neural-network-based, models with survival data.