Abstract Background: Plasma proteins, directly secreted from tumor cells or a result of the body’s response to a tumor, may have utility for early detection. We aimed to identify plasma protein combinations that predict prediagnostic cancers in a prospective study. Methods: We sampled from 8,186 ARIC study participants without a cancer diagnosis at blood draw and with 4,877 log2-transformed proteins measured by SomaScan. We selected as cases those diagnosed within 5 years after blood draw and were registry/medical record confirmed. We selected as controls those who never had a cancer history by 2015 and did not die of cancer. Participants with possible liver, kidney, or inflammatory conditions were excluded (eGFR-cr<30, top 1% of plasma AST, ALT, CRP). Highly correlated proteins (r>±0.75), abundant or known markers (albumin, CRP, PSA), and proteins with wide log2-transformed distributions (SD>1 and >10% outliers-1.5 IQR away from 25th or 75th percentile) among controls were excluded. Recursive feature elimination (RFE) with random forest (RF) was used to select the top 10 informative proteins based on accuracy. Non-protein features included demographic, lifestyle (e.g., smoking), and medical factors. We next divided participants into train and test sets in 7:3 ratio stratified by case status. RF including top 10 proteins from the RFE and non-protein features trained in the train set were used to predict near-term cancer status in the test set and to calculate prediction performance. Random Over-Sampling Examples were used to balance between rare cases (<3%) and controls in the dataset. Results: We included 210 cases (98 diagnosed within 2 years) and 7,042 controls with 3,476 proteins. 58% were female, 24% were Black, and median age was 57 years (IQR: 52-62). The most common cancer was lung (27%). In the test set, sensitivity and specificity of the model for total cases diagnosed within 2 years were 0.19 and 0.91. Results for cases diagnosed within 3 (0.24, 0.88) and 5 (0.29, 0.87) years were similar. Using the top 10 informative proteins for total cancer, sensitivity was 0.68 and specificity was 0.56 for lung cancers diagnosed within 2 years; for lung cancers diagnosed within 3 (0.55, 0.77) or 5 (0.53, 0.79) years, sensitivity decreased but specificity increased. Using the top 10 informative proteins specific for lung cancer, which did not overlap with the top 10 for total cancer, sensitivity and specificity were 0.69 and 0.67 for lung cancers diagnosed within 2 years. Conclusions: In this study, in which blood draw preceded diagnosis, top informative proteins did not provide sufficient sensitivity for all-cancer near-term prediction. For lung cancer, further optimization may improve the sensitivity and specificity to achieve the range required for population implementation for screening. Validation with more cases, and on other proteomic platforms are needed. Funding NHLBI, NCI, NPCR Citation Format: Meng Ru, Christopher Douville, Kenneth R. Butler, Corinne E. Joshu, Jiayun Lu, Anna Prizment, Josef Coresh, Elizabeth A. Platz. Identification of plasma proteins for early detection of cancer, including lung, in the Atherosclerosis Risk in Communities (ARIC) study [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 6084.
Read full abstract