Abstract

Randomized clinical trials (RCT) are the gold standard for informing treatment decisions. Observational studies are often plagued by selection bias, and expert-selected covariates may insufficiently adjust for confounding. We explore how unstructured clinical text can be used to reduce selection bias and improve medical practice. We develop a framework based on natural language processing to uncover interpretable potential confounders from text. We validate our method by comparing the estimated hazard ratio (HR) with and without the confounders against established RCTs. We apply our method to four cohorts built from localized prostate and lung cancer datasets from the Stanford Cancer Institute and show that our method shifts the HR estimate towards the RCT results. The uncovered terms can also be interpreted by oncologists for clinical insights. We present this proof-of-concept study to enable more credible causal inference using observational data, uncover meaningful insights from clinical text, and inform high-stakes medical decisions.

Highlights

  • Randomized clinical trials (RCT) are the gold standard for informing treatment decisions

  • In 2016, Wallis et al.[7] showed through population-based studies that surgery is superior to radiation for early-stage prostate cancer for overall and prostate cancer-specific survival; a few months later, the finding was refuted by Hamdy et al.[8], which showed that surgery and radiation are equivalent in terms of overall and prostate cancer-specific survival

  • We apply our methods to localized prostate and stage I non-small cell lung cancer (NSCLC) patients and compare the results against established RCTs

Read more

Summary

Introduction

Randomized clinical trials (RCT) are the gold standard for informing treatment decisions. The uncovered terms can be interpreted by oncologists for clinical insights We present this proof-of-concept study to enable more credible causal inference using observational data, uncover meaningful insights from clinical text, and inform high-stakes medical decisions. National Cancer Data Base (NCDB) to perform CER Such studies may be unreliable due to the systemic bias present in observational data and the presence of unmeasured confounders[1,2,4]. There has been a growing interest in using observational data for clinical decision-making and causal inference in oncology[2] Such studies are often unreliable, and many observational studies have been refuted by RCTs soon after[2,4]. Our paper contributes to this literature by addressing obstacles in using NLP methods to remove confounding

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.