Abstract

Background:An important issue in prediction modeling of multivariate data is the measure of dependence structure. The use of Pearson's correlation as a dependence measure has several pitfalls and hence application of regression prediction models based on this correlation may not be an appropriate methodology. As an alternative, a copula based methodology for prediction modeling and an algorithm to simulate data are proposed.Methods:The method consists of introducing copulas as an alternative to the correlation coefficient commonly used as a measure of dependence. An algorithm based on the marginal distributions of random variables is applied to construct the Archimedean copulas. Monte Carlo simulations are carried out to replicate datasets, estimate prediction model parameters and validate them using Lin's concordance measure.Results:We have carried out a correlation-based regression analysis on data from 20 patients aged 17–82 years on pre-operative and post-operative ejection fractions after surgery and estimated the prediction model: Post-operative ejection fraction = - 0.0658 + 0.8403 (Pre-operative ejection fraction); p = 0.0008; 95% confidence interval of the slope coefficient (0.3998, 1.2808). From the exploratory data analysis, it is noted that both the pre-operative and post-operative ejection fractions measurements have slight departures from symmetry and are skewed to the left. It is also noted that the measurements tend to be widely spread and have shorter tails compared to normal distribution. Therefore predictions made from the correlation-based model corresponding to the pre-operative ejection fraction measurements in the lower range may not be accurate. Further it is found that the best approximated marginal distributions of pre-operative and post-operative ejection fractions (using q-q plots) are gamma distributions. The copula based prediction model is estimated as: Post -operative ejection fraction = - 0.0933 + 0.8907 × (Pre-operative ejection fraction); p = 0.00008 ; 95% confidence interval for slope coefficient (0.4810, 1.3003). For both models differences in the predicted post-operative ejection fractions in the lower range of pre-operative ejection measurements are considerably different and prediction errors due to copula model are smaller. To validate the copula methodology we have re-sampled with replacement fifty independent bootstrap samples and have estimated concordance statistics 0.7722 (p = 0.0224) for the copula model and 0.7237 (p = 0.0604) for the correlation model. The predicted and observed measurements are concordant for both models. The estimates of accuracy components are 0.9233 and 0.8654 for copula and correlation models respectively.Conclusion:Copula-based prediction modeling is demonstrated to be an appropriate alternative to the conventional correlation-based prediction modeling since the correlation-based prediction models are not appropriate to model the dependence in populations with asymmetrical tails. Proposed copula-based prediction model has been validated using the independent bootstrap samples.

Highlights

  • An important issue in prediction modeling of multivariate data is the measure of dependence structure

  • This paper describes the copula-based prediction modeling which can be employed as an alternative to the conventional correlation-based modeling in any multivariate clinical applications including risk-prediction

  • Data collected were on patient's age, gender, NYHA class, heart rate, systolic blood pressure, ejection fraction, EDVI-volume of the left ventricle after the heart relaxes adjusted for body surface area (BSA), SVIvolume of the left ventricle after the blood is pumped out adjusted for BSA, ESVI- volume of the left ventricle pumped out during one cycle adjusted for BSA; ESVI=EDVI-SVI

Read more

Summary

Introduction

An important issue in prediction modeling of multivariate data is the measure of dependence structure. The use of Pearson's correlation as a dependence measure has several pitfalls and application of regression prediction models based on this correlation may not be an appropriate methodology. Experts associated with developing, evaluating, or using risk prediction models met to identify the strengths and limitations of cancer and genetic susceptibility prediction models currently in use and under development, in order to explore the methodological issues related to their development, evaluation and validation and to identify the research priorities and resources needed to advance the field [1]. Independence of two random variables implies that they are uncorrelated but zero correlation, in general, does not imply independence unless distributions are multivariate normal. For an excellent review of dependence measures and their desirable properties, we refer to [2,3,4]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call