Abstract

BackgroundLung cancer is the leading cause of cancer death worldwide. Early detection of individuals at risk of lung cancer is critical to reduce the mortality rate.ObjectiveThe aim of this study was to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population.MethodsData from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. The study population consisted of patients with at least one EHR between April 1, 2016, and March 31, 2018, who had no history of lung cancer. A retrospective cohort (N=873,598) and a prospective cohort (N=836,659) were formed for model construction and validation. An Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model. It assigned a score to each individual to quantify the probability of a new incident lung cancer diagnosis from October 1, 2016, to September 31, 2017. The model was trained with the clinical profile in the retrospective cohort from the preceding 6 months and validated with the prospective cohort to predict the risk of incident lung cancer from April 1, 2017, to March 31, 2018.ResultsThe model had an area under the curve (AUC) of 0.881 (95% CI 0.873-0.889) in the prospective cohort. Two thresholds of 0.0045 and 0.01 were applied to the predictive scores to stratify the population into low-, medium-, and high-risk categories. The incidence of lung cancer in the high-risk category (579/53,922, 1.07%) was 7.7 times higher than that in the overall cohort (1167/836,659, 0.14%). Age, a history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer.ConclusionsWe retrospectively developed and prospectively validated an accurate risk prediction model of new incident lung cancer occurring in the next 1 year. Through statistical learning from the statewide EHR data in the preceding 6 months, our model was able to identify statewide high-risk patients, which will benefit the population health through establishment of preventive interventions or more intensive surveillance.

Highlights

  • BackgroundLung cancer is the most common cancer and leading cause of cancer death worldwide [1,2]

  • A history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer

  • By applying the XGBoost algorithm on the electronic health record EPIC (EHR)-based data, the prediction model reached an area under the curve (AUC) of 0.881 in the prospective cohort (Figure 2)

Read more

Summary

Introduction

Lung cancer is the most common cancer and leading cause of cancer death worldwide [1,2]. In 2018, the number of new cases of lung and bronchus cancer was estimated to be 234,030 (13.5% of all new cancer cases); an estimated 154,050 people will die of this disease (25.3% of all cancer-related deaths) in the United States alone [3]. Statistics on survival in people with lung cancer vary depending on the stage of the cancer when it is diagnosed. Early detection and timely disease intervention play an important role in reducing the mortality rate of lung cancer. Lung cancer is the leading cause of cancer death worldwide. Detection of individuals at risk of lung cancer is critical to reduce the mortality rate

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.