Cancer Risk Prediction Research Articles

BackgroundEarly detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.MethodsThe review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts “artificial intelligence”, “prediction”, “health records”, “longitudinal”, and “cancer”. Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models.ResultsOf 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26).ConclusionThis review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients’ trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.

Read full abstract

Incorporation of mammographic density to breast cancer risk models could improve risk stratification to tailor screening and prevention strategies according to risk. Robust evaluation of the value of adding mammographic density to models with comprehensive information on questionnaire-based risk factors and polygenic risk score is needed to determine its effectiveness in improving risk stratification of such models. We used the Individualized Coherent Absolute Risk Estimator (iCARE) tool for risk model building and validation to incorporate density to a previously validated literature-based model with questionnaire-based risk factors and a 313-variant polygenic risk score (PRS). The model was evaluated for calibration and discrimination in three prospective cohorts of European-ancestry women (1,468 cases, 19,104 controls): US-based Nurses' Health Study (NHS I and II) and Mayo Mammography Health Study (MMHS); and Sweden-based Karolinska Mammography Project for Risk Prediction of Breast Cancer (KARMA) study. Analyses were done separately for women younger (NHS II, KARMA) and older than 50 years (NHS I, MMHS, KARMA). Improvements in terms of risk stratification and reclassification proportions were assessed among European-ancestry women aged 50-70 years in US and Sweden. For women younger and older than 50 years, the model with questionnaire-based risk factors, PRS and density was generally well calibrated across risk with some evidence of miscalibration at the extremes of the risk distribution. Incorporation of density led to modest improvements risk discrimination beyond the model with questionnaire-based risk factors and PRS: the area under the curve (AUC) among younger women was 67.0% (95% CI: 63.5-70.6%) vs. 65.6% (95% CI: 61.9-69.3%) for models with and without density; and 66.1% (95% CI 64.4-67.8%) vs. 65.5% (95% CI: 63.8-67.2%) among older women. The model with density identified 18.4% of US women 50-70 years old ≥ 3% 5-year predicted risk (threshold used for recommending risk-reducing medication in the US), with 42.4% of future cases expected to occur in this group. At this threshold, 7.9% of US women were reclassified by adding density to the model, resulting in the identification of 2.8% of additional future cases. The model with density identified 10.3% of Swedish women ≥ 3% 5-year predicted risk, with 29.4% of future cases expected to occur in this group. At this threshold, 5.3% of women were reclassified with the addition of density, leading to the identification of an additional 4.4% of future cases. Integrating density with questionnaire-based risk factors and PRS could potentially identify more women of European-ancestry with elevated risk of breast cancer in the United States and Sweden. Further investigations of the integrated model in non-European ancestry populations are needed prior to considering clinical applications.

Read full abstract

Cancer Risk Prediction Research Articles

Related Topics

Articles published on Cancer Risk Prediction

Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review

Ensemble machine learning models for lung cancer incidence risk prediction in the elderly: a retrospective longitudinal study

AI image analysis as the basis for risk-stratified screening.

A benchmark of deep learning approaches to predict lung cancer risk using national lung screening trial cohort

Selective adsorption of unmethylated DNA on ZnO nanowires for separation of methylated DNA.

Prediction of pancreatic cancer risk in patients with new-onset diabetes

A web-based tool for cancer risk prediction for middle-aged and elderly adults using machine learning algorithms and self-reported questions.

Comparison of PREMM5 and PREMMplus Risk Assessment Models to Identify Lynch Syndrome.

NOTIFICATION: Development of a Novel Approach for Breast Cancer Prediction and Early Detection Using Minimally Invasive Procedures and Molecular Analysis: How Cytomorphology Became a Breast Cancer Risk Predictor

Polygenic score distribution differences across European ancestry populations: implications for breast cancer risk prediction

Investigating the added value of incorporatingmammographic density to an integrated breastcancer risk model with questionnaire-based riskfactors and polygenic risk score.

Longitudinal interpretability of deep learning based breast cancer risk prediction.

Cancer Risk in Thyroid Nodules: An Analysis of Over 1000 Consecutive FNA Biopsies Performed in a Single Canadian Institution

A multimodal machine learning model for the stratification of breast cancer risk.

Using New Technologies to Analyze Gut Microbiota and Predict Cancer Risk.

Liver Function Biomarkers and Lung Cancer Risk: A Prospective Cohort Study in the UK Biobank.

Genetic Variants and Haplotype Structures in the CASC Gene Family to Predict Cancer Risk: A Bioinformatics Study.

Plasma prolactin and postmenopausal breast cancer risk: a pooled analysis of four prospective cohort studies.

A bibliometric review of predictive modelling for cervical cancer risk.

Prostate Cancer Risk Prediction Model Using Clinical and Magnetic Resonance Imaging–Related Findings: Impact of Combining Lesions' Locations and Apparent Diffusion Coefficient Values

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cancer Risk Prediction Research Articles

Related Topics

Articles published on Cancer Risk Prediction

Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review

Ensemble machine learning models for lung cancer incidence risk prediction in the elderly: a retrospective longitudinal study

AI image analysis as the basis for risk-stratified screening.

A benchmark of deep learning approaches to predict lung cancer risk using national lung screening trial cohort

Selective adsorption of unmethylated DNA on ZnO nanowires for separation of methylated DNA.

Prediction of pancreatic cancer risk in patients with new-onset diabetes

A web-based tool for cancer risk prediction for middle-aged and elderly adults using machine learning algorithms and self-reported questions.

Comparison of PREMM5 and PREMMplus Risk Assessment Models to Identify Lynch Syndrome.

NOTIFICATION: Development of a Novel Approach for Breast Cancer Prediction and Early Detection Using Minimally Invasive Procedures and Molecular Analysis: How Cytomorphology Became a Breast Cancer Risk Predictor

Polygenic score distribution differences across European ancestry populations: implications for breast cancer risk prediction

Investigating the added value of incorporatingmammographic density to an integrated breastcancer risk model with questionnaire-based riskfactors and polygenic risk score.

Longitudinal interpretability of deep learning based breast cancer risk prediction.

Cancer Risk in Thyroid Nodules: An Analysis of Over 1000 Consecutive FNA Biopsies Performed in a Single Canadian Institution

A multimodal machine learning model for the stratification of breast cancer risk.

Using New Technologies to Analyze Gut Microbiota and Predict Cancer Risk.

Liver Function Biomarkers and Lung Cancer Risk: A Prospective Cohort Study in the UK Biobank.

Genetic Variants and Haplotype Structures in the CASC Gene Family to Predict Cancer Risk: A Bioinformatics Study.

Plasma prolactin and postmenopausal breast cancer risk: a pooled analysis of four prospective cohort studies.

A bibliometric review of predictive modelling for cervical cancer risk.

Prostate Cancer Risk Prediction Model Using Clinical and Magnetic Resonance Imaging–Related Findings: Impact of Combining Lesions' Locations and Apparent Diffusion Coefficient Values