Abstract

Abstract Background: Machine learning (ML) methods are becoming more feasible for use in clinical and epidemiologic research of breast cancer, particularly when characterizing histopathology. Compared to supervised ML methods, unsupervised approaches represent an opportunity to distinguish features heretofore unknown. The purpose of this study was to use unsupervised deep learning methods to identify histopathological features in diagnostic breast cancer hematoxylin and eosin (H&E) slides that are associated with clinical characteristics and patient outcomes. Methods: One H&E slide was scanned (Leica Biosystems Aperio Versa scanner) at 20x magnification for each of 1,716 women diagnosed with breast cancer from the Cancer Prevention Study-II Nutrition Cohort. In the pre-processing phase, the scanned images underwent color normalization, artifact detection, and tiling. We then used an un-pretrained VGG16 autoencoder with data augmentation for feature learning and extraction from tiles. These features were two-tiered clustered using the K-means algorithm. Each tile was assigned the cluster with the highest probability. The tiles were reassembled into whole slide images. For each slide, the proportion of tiles in each cluster was calculated. We will associate clusters with clinical features and 5- and 10-year breast cancer-specific survival using multivariable logistic and Cox proportional hazards regression models, respectively. Results: Mean age at baseline enrollment (1992-1993) and breast cancer diagnosis for the cases was 60.6 years (SD=6.0) and 71.5 years (SD=7.0), respectively. The majority of cancer diagnoses occurred after 1999 (79%) and 81% of women included were diagnosed invasive breast cancer. The final pipeline for the full set of images is currently being built. Preliminary runs at the 1x magnification level with 100 cases (N=21,472 tiles) have shown clustering based on macro-level features such as adipose, stromal and epithelial content. Second-tier clustering (clustering within clusters) shows further delineation of groups within clusters of interest (i.e. epithelial-cell rich regions). The final output with all 1,716 slides will be based on analysis at the 5x magnification level. Discussion: We expect that some histopathological features identified by ML models will be associated with conventional pathology features, clinical features, and breast cancer-specific survival. Utilization of ML methods for analyzing histology slides provides additional data that can be integrated into epidemiological studies. Future directions include analyzing images at higher magnifications (10x or 20x) and assessing the association between ML histopathological characteristics and breast cancer risk factors and incorporating these characteristics into prognostic models. Citation Format: Samantha Puvanesarajah, James M. Hodge, Jacob L. Evans, William Seo, Michelle Yi, Michelle M. Fritz, Mary Macheski-Preston, Ted Gansler, Susan M. Gapstur, Mia M. Gaudet. Unsupervised deep-learning to identify histopathological features among breast cancers in the Cancer Prevention Study-II Nutrition Cohort [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 2417.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.