Quantifying uncertainty of individualized treatment effects in right-censored survival data: a comparison of Bayesian additive regression trees and causal survival forest
Quantifying uncertainty of individualized treatment effects in right-censored survival data: a comparison of Bayesian additive regression trees and causal survival forest
- Research Article
1
- 10.1161/jaha.118.002294
- May 4, 2019
- Journal of the American Heart Association
Additive Regression Trees (BART) for Normalized Left Ventricular Mass (LVM), LVM and Left Ventricular Hypertrophy (LVH).https://www.mcw.
- Research Article
61
- 10.1177/0962280220921909
- May 25, 2020
- Statistical Methods in Medical Research
There is a dearth of robust methods to estimate the causal effects of multiple treatments when the outcome is binary. This paper uses two unique sets of simulations to propose and evaluate the use of Bayesian additive regression trees in such settings. First, we compare Bayesian additive regression trees to several approaches that have been proposed for continuous outcomes, including inverse probability of treatment weighting, targeted maximum likelihood estimator, vector matching, and regression adjustment. Results suggest that under conditions of non-linearity and non-additivity of both the treatment assignment and outcome generating mechanisms, Bayesian additive regression trees, targeted maximum likelihood estimator, and inverse probability of treatment weighting using generalized boosted models provide better bias reduction and smaller root mean squared error. Bayesian additive regression trees and targeted maximum likelihood estimator provide more consistent 95% confidence interval coverage and better large-sample convergence property. Second, we supply Bayesian additive regression trees with a strategy to identify a common support region for retaining inferential units and for avoiding extrapolating over areas of the covariate space where common support does not exist. Bayesian additive regression trees retain more inferential units than the generalized propensity score-based strategy, and shows lower bias, compared to targeted maximum likelihood estimator or generalized boosted model, in a variety of scenarios differing by the degree of covariate overlap. A case study examining the effects of three surgical approaches for non-small cell lung cancer demonstrates the methods.
- Research Article
19
- 10.1111/biom.13478
- Apr 29, 2021
- Biometrics
Popular parametric and semiparametric hazards regression models for clustered survival data are inappropriate and inadequate when the unknown effects of different covariates and clustering are complex. This calls for a flexible modeling framework to yield efficient survival prediction. Moreover, for some survival studies involving time to occurrence of some asymptomatic events, survival times are typically interval censored between consecutive clinical inspections. In this article, we propose a robust semiparametric model for clustered interval-censored survival data under a paradigm of Bayesian ensemble learning, called soft Bayesian additive regression trees or SBART (Linero and Yang, 2018), which combines multiple sparse (soft) decision trees to attain excellent predictive accuracy. We develop a novel semiparametric hazards regression model by modeling the hazard function as a product of a parametric baseline hazard function and a nonparametric component that uses SBART to incorporate clustering, unknown functional forms of the main effects, and interaction effects of various covariates. In addition to being applicable for left-censored, right-censored, and interval-censored survival data, our methodology is implemented using a data augmentation scheme which allows for existing Bayesian backfitting algorithms to be used. We illustrate the practical implementation and advantages of our method via simulation studies and an analysis of a prostate cancer surgery study where dependence on the experience and skill level of the physicians leads to clustering of survival times. We conclude by discussing our method's applicability in studies involving high-dimensional data with complex underlyingassociations.
- Research Article
53
- 10.1177/0962280217746191
- Dec 18, 2017
- Statistical Methods in Medical Research
Individualized treatment rules can improve health outcomes by recognizing that patients may respond differently to treatment and assigning therapy with the most desirable predicted outcome for each individual. Flexible and efficient prediction models are desired as a basis for such individualized treatment rules to handle potentially complex interactions between patient factors and treatment. Modern Bayesian semiparametric and nonparametric regression models provide an attractive avenue in this regard as these allow natural posterior uncertainty quantification of patient specific treatment decisions as well as the population wide value of the prediction-based individualized treatment rule. In addition, via the use of such models, inference is also available for the value of the optimal individualized treatment rules. We propose such an approach and implement it using Bayesian Additive Regression Trees as this model has been shown to perform well in fitting nonparametric regression functions to continuous and binary responses, even with many covariates. It is also computationally efficient for use in practice. With Bayesian Additive Regression Trees, we investigate a treatment strategy which utilizes individualized predictions of patient outcomes from Bayesian Additive Regression Trees models. Posterior distributions of patient outcomes under each treatment are used to assign the treatment that maximizes the expected posterior utility. We also describe how to approximate such a treatment policy with a clinically interpretable individualized treatment rule, and quantify its expected outcome. The proposed method performs very well in extensive simulation studies in comparison with several existing methods. We illustrate the usage of the proposed method to identify an individualized choice of conditioning regimen for patients undergoing hematopoietic cell transplantation and quantify the value of this method of choice in relation to the optimal individualized treatment rule as well as non-individualized treatment strategies.
- Research Article
1
- 10.30598/barekengvol17iss1pp0135-0146
- Apr 15, 2023
- BAREKENG: Jurnal Ilmu Matematika dan Terapan
Bayesian Additive Regression Tree (BART) is a sum-of-trees model used to approximate classification or regression cases. The main idea of this method is to use a prior distribution to keep the tree size small and a likelihood from data to get the posterior. By fixing the tree size as small as possible, the approximation of each tree would have a little effect on the posterior, which is the sum of all output from all the trees used. Bayesian additive regression tree method will be used for predicting the maternity recovery rate of group long-term disability insurance data from the Society of Actuaries (SOA). The decision tree-based models such as Gradient Boosting Machine, Random Forest, Decision Tree, and Bayesian Additive Regression Tree model are compared to find the best model by comparing mean squared error and program runtime. After comparing some models, the Bayesian Additive Regression Tree model gives the best prediction based on smaller root mean squared error values and relatively short runtime.
- Research Article
2
- 10.1097/tp.0000000000002274
- Aug 1, 2018
- Transplantation
Optimizing organ yield (number of organs transplanted per donor) is a modifiable way to increase the number of organs available for transplant. Historically, models to predict donor organ yield have been developed based ordinary least squares regression and ordinal logistic regression; however, alternative modeling methodology may be superior to conventional approaches.1,2 In this preliminary analysis, rather than treating organ yield as a continuous outcome, we modeled the number of organs transplanted per donor as counts. We aimed to compare different linear and nonlinear statistical models for count responses to predict deceased donor organ yield. We used data from the OPTN database from 2000 to 2016 to parameterize our exploratory models. The initial set of predictors for deceased donor organ yield was derived from published studies.1-3 We included adult deceased donors between 18 and 84 years of age that had at least 1 organ procured for transplantation. 75 350 records met inclusion criteria. We used 80% of the data for derivation in a cross-validation analysis and the remainder of the data as a validation set. The cross-validation analysis was replicated 50 times, and the random holdouts consisted of 20% of the derivation cohort. The following models were evaluated: ordinary least squares regression,1 ordinal logistic regression,2 Poisson regression, negative binomial regression, general additive models, classification and regression trees, random forests, bootstrap aggregated classification and regression trees, boosted classification and regression trees, Bayesian additive regression trees (BART), multivariate adaptive regression splines, artificial neural networks, and mean-only models. Among the models, BART resulted in the lowest error on predicting the number of organs transplanted per deceased donor. Two-sample t tests showed that the BART had significantly lower mean absolute error (MAE) when predicting deceased donor organ yield (all P < 0.001). On average, this model presented a MAE of 0.867 throughout the cross-validation analysis, and a MAE of 0.856 when tested in the validation set. The BART showed that deceased donor organ yield had a negative nonlinear relationship with age, body mass index, terminal blood urea nitrogen, terminal laboratory creatinine, aspartate aminotransferase, terminal laboratory total bilirubin; a positive nonlinear relationship with organ recovery time, partial pressure of oxygen levels, and last serum sodium; and more complex nonlinear relationships with alanine aminotransferase and the ratio of partial pressure arterial oxygen and fraction of inspired oxygen. Bayesian additive regression trees would improve prediction from at least 63 organs per 1000 donors (compared with an ordinary least squares regression1) to at most 120 organs per 1000 donors (compared with an ordinal logistic regression2). Through the use of BART, we were able to obtain higher predictive accuracy for organ yield. This model allows for nonlinear relationships among the predictors and the number of organs transplanted per deceased donor, which likely explains the superior performance compared with conventional models. In conclusion, our preliminary analysis shows that the BART methodology is superior in predicting deceased donor organ yield and can potentially serve as an aid to assess organ procurement organization performance, reduce geographic disparities, and in forecasting future organ availability. A forthcoming article will include the finalized analysis.
- Research Article
- 10.3390/math13132195
- Jul 4, 2025
- Mathematics
In causal inference research, accurate estimation of individualized treatment effects (ITEs) is at the core of effective intervention. This paper proposes a dual-structure ITE-estimation model based on Bayesian Additive Regression Trees (BART), which constructs independent BART sub-models for the treatment and control groups, estimates ITEs using the potential outcome framework and enhances posterior stability and estimation reliability through Markov Chain Monte Carlo (MCMC) sampling. Based on psychological stress questionnaire data from graduate students, the study first integrates BART with the Shapley value method to identify employment pressure as a key driving factor and reveals substantial heterogeneity in ITEs across subgroups. Furthermore, the study constructs an ITE model using a dual-structured BART framework (BART-ITE), where employment pressure is defined as the treatment variable. Experimental results show that the model performs well in terms of credible interval width and ranking ability, demonstrating superior heterogeneity detection and individual-level sorting. External validation using both the Bootstrap method and matching-based pseudo-ITE estimation confirms the robustness of the proposed model. Compared with mainstream meta-learning methods such as S-Learner, X-Learner and Bayesian Causal Forest, the dual-structure BART-ITE model achieves a favorable balance between root mean square error and bias. In summary, it offers clear advantages in capturing ITE heterogeneity and enhancing estimation reliability and individualized decision-making.
- Research Article
1
- 10.9734/ajpas/2023/v23i1494
- Jun 10, 2023
- Asian Journal of Probability and Statistics
Aims: This study aims at determining the classification results using the Bayesian Additive Regression Trees (BART) method on bank credit collectability data, where there is a class imbalance in the data.
 Study Design: Quantitative Design.
 Place and Duration of Study: The used data are secondary data in the form of bank debtor’s credit collectability data with nine predictor variables and one response variable in the form of credit collectability. They are collected from Banks in East Java, Indonesia, from the date of 01 May 1986 to 31 May 2018.
 Methodology: The Bayesian approach is one of the estimation methods in statistics that is currently being popularly used, this is because the rapid development of technology makes computational challenges no longer a problem. The Bayesian estimation continues to develop and can be used in various statistical methods, for instance both for regression and classification. The Classification and Regression Trees (CART) method is one of the most popular classification methods used. Debtors, in a bank, who have delinquent credit have a small proportion compared to debtors who have current credit. Standard classifier methods such as CART are not suitable for handling this case, as CART is sensitive to classes that have a high degree. Hence, additional methods such as ensemble BART (Bayesian Additive Regression Trees), are needed in order to increase the accuracy of classification in cases of class imbalance.
 Results: The results of the cross-validation on the BART show a high consistency of classification accuracy, 83.49%. This indicates that the BART method can work consistently even though there is a class imbalance. The results of this study indicate that the classification accuracy of the training data is 84.53%, while the accuracy in the testing data is 85.48%. These results also show that the BART method has ability to overcome overfitting in the classification method, where overfitting often occurs in most of the classification methods that have very good classification abilities.
 Conclusion: The testing data show that the accuracy is relatively similar to the one of the training data, this indicates that the BART method has been able to capture patterns in the data.
- Research Article
2
- 10.1016/j.clinimag.2023.110047
- Nov 28, 2023
- Clinical Imaging
Developing radiology diagnostic tools for pulmonary fibrosis using machine learning methods
- Book Chapter
3
- 10.1016/bs.host.2016.07.007
- Jan 1, 2016
Bayesian Additive Regression Tree for Seemingly Unrelated Regression with Automatic Tree Selection
- Research Article
- 10.5772/6554
- Jan 1, 2009
The changeable structures and variability of email attacks render current email filtering solutions useless. Consequently, the need for new techniques to harden the protection of users' security and privacy becomes a necessity. The variety of email attacks, namely spam, damages networks' infrastructure and exposes users to new attack vectors daily. Spam is unsolicited email which targets users with different types of commercial messages or advertisements. Porn-related content that contains explicit material or commercials of exploited children is a major trend in these messages as well. The waste of network bandwidth due to the numerous number of spam messages sent and the requirement of complex hardware, software, network resources, and human power are other problems associated with these attacks. Recently, security researchers have noticed an increase in malicious content delivered by these messages, which arises security concerns due to their attack potential. More seriously, phishing attacks have been on the rise for the past couple of years. Phishing is the act of sending a forged e-mail to a recipient, falsely mimicking a legitimate establishment in an attempt to scam the recipient into divulging private information such as credit card numbers or bank account passwords (James, 2005). Recently phishing attacks have become a major concern to financial institutions and law enforcement due to the heavy monetary losses involved. According to a survey by Gartner group, in 2006 approximately 3.25 million victims were spoofed by phishing attacks and in 2007 the number increased by almost 1.3 million victims. Furthermore, in 2007, monetary losses, related to phishing attacks, were estimated by $3.2 billion. All the aforementioned concerns raise the need for new detection mechanisms to subvert email attacks in their various forms. Despite the abundance of applications available for phishing detection, unlike spam classification, there are only few studies that compare machine learning techniques in predicting phishing emails (Abu-Nimeh et al., 2007). We describe a new version of Bayesian Additive Regression Trees (BART) and apply it to phishing detection. A phishing dataset is constructed from 1409 raw phishing emails and 5152 legitimate emails, where 71 features (variables) are used in classifiers' training and testing. The variables consist of both textual and structural features that are extracted from raw emails. The performance of six classifiers, on this dataset, is compared using the area under the curve (AUC) (Huang & Ling, 2005). The classifiers include Logistic Regression (LR), Classification and Regression Trees (CART), Bayesian Additive Regression Trees (BART), Support Vector Machines (SVM), Random O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg
- Research Article
4
- 10.1016/j.csda.2023.107858
- Sep 25, 2023
- Computational Statistics & Data Analysis
The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines “weak” tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of an explicit covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. The Gaussian processes Bayesian additive regression trees (GP-BART) model is an extension of BART which addresses this limitation by assuming Gaussian process (GP) priors for the predictions of each terminal node among all trees. The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modelling approaches in various scenarios.
- Research Article
9
- 10.1016/j.ecoinf.2020.101198
- Nov 12, 2020
- Ecological Informatics
Statistical comparison of additive regression tree methods on ecological grassland data
- Research Article
80
- 10.1111/2041-210x.13389
- Apr 16, 2020
- Methods in Ecology and Evolution
embarcadero is an r package of convenience tools for species distribution modelling (SDM) with Bayesian additive regression trees (BART), a powerful machine learning approach that has been rarely applied to ecological problems. Like other classification and regression tree methods, BART estimates the probability of a binary outcome based on a set of decision trees. Unlike other methods, BART iteratively generates sets of trees based on a set of priors about tree structure and nodes, and builds a posterior distribution of estimated classification probabilities. So far, BARTs have yet to be applied to SDM. embarcadero is a workflow wrapper for BART species distribution models, and includes functionality for easy spartial prediction, an automated variable selection procedure, several types of partial dependence visualization and other tools for ecological application. The embarcadero package is an open source and available on Github. To show how embarcadero can be used by ecologists, I illustrate a BART workflow for a virtual species distribution model. The supplement includes a more advanced vignette showing how BART can be used for mapping disease transmission risk, using the example of Crimean–Congo haemorrhagic fever in Africa.
- Research Article
46
- 10.1186/s12711-016-0219-8
- Jun 10, 2016
- Genetics, Selection, Evolution : GSE
BackgroundThe goal of genome-wide prediction (GWP) is to predict phenotypes based on marker genotypes, often obtained through single nucleotide polymorphism (SNP) chips. The major problem with GWP is high-dimensional data from many thousands of SNPs scored on several thousands of individuals. A large number of methods have been developed for GWP, which are mostly parametric methods that assume statistical linearity and only additive genetic effects. The Bayesian additive regression trees (BART) method was recently proposed and is based on the sum of nonparametric regression trees with the priors being used to regularize the parameters. Each regression tree is based on a recursive binary partitioning of the predictor space that approximates an unknown function, which will automatically model nonlinearities within SNPs (dominance) and interactions between SNPs (epistasis). In this study, we introduced BART and compared its predictive performance with that of the LASSO, Bayesian LASSO (BLASSO), genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space (RKHS) regression and random forest (RF) methods.ResultsTests on the QTLMAS2010 simulated data, which are mainly based on additive genetic effects, show that cross-validated optimization of BART provides a smaller prediction error than the RF, BLASSO, GBLUP and RKHS methods, and is almost as accurate as the LASSO method. If dominance and epistasis effects are added to the QTLMAS2010 data, the accuracy of BART relative to the other methods was increased. We also showed that BART can produce importance measures on the SNPs through variable inclusion proportions. In evaluations using real data on pigs, the prediction error was smaller with BART than with the other methods.ConclusionsBART was shown to be an accurate method for GWP, in which the regression trees guarantee a very sparse representation of additive and complex non-additive genetic effects. Moreover, the Markov chain Monte Carlo algorithm with Bayesian back-fitting provides a computationally efficient procedure that is suitable for high-dimensional genomic data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-016-0219-8) contains supplementary material, which is available to authorized users.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.