Abstract

PURPOSEMachine Learning Package for Cancer Diagnosis (MLCD) is the result of a National Institutes of Health/National Cancer Institute (NIH/NCI)-sponsored project for developing a unified software package from state-of-the-art breast cancer biopsy diagnosis and machine learning algorithms that can improve the quality of both clinical practice and ongoing research.METHODSWhole-slide images of 240 well-characterized breast biopsy cases, initially assembled under R01 CA140560, were used for developing the algorithms and training the machine learning models. This software package is based on the methodology developed and published under our recent NIH/NCI-sponsored research grant (R01 CA172343) for finding regions of interest (ROIs) in whole-slide breast biopsy images, for segmenting ROIs into histopathologic tissue types and for using this segmentation in classifiers that can suggest final diagnoses.RESULTThe package provides an ROI detector for whole-slide images and modules for semantic segmentation into tissue classes and diagnostic classification into 4 classes (benign, atypia, ductal carcinoma in situ, invasive cancer) of the ROIs. It is available through the GitHub repository under the Massachusetts Institute of Technology license and will later be distributed with the Pathology Image Informatics Platform system. A Web page provides instructions for use.CONCLUSIONOur tools have the potential to provide help to other cancer researchers and, ultimately, to practicing physicians and will motivate future research in this field. This article describes the methodology behind the software development and gives sample outputs to guide those interested in using this package.

Highlights

  • The long-term goal of the National Institutes of Health/ National Institute of Cancer (NIH/NIC)–sponsored project, “A Unified Machine Learning Package for Cancer Diagnosis” (U01CA231782), which is part of the Information Technology for Cancer Research (ITCR) program, was the development of a unified software package for the diagnosis of cancer from whole-slide biopsy images

  • Whole-slide images of 240 well-characterized breast biopsy cases, initially assembled under R01 CA140560, were used for developing the algorithms and training the machine learning models. This software package is based on the methodology developed and published under our recent National Institutes of Health/ National Cancer Institute (NIH/NCI)-sponsored research grant (R01 CA172343) for finding regions of interest (ROIs) in whole-slide breast biopsy images, for segmenting ROIs into histopathologic tissue types and for using this segmentation in classifiers that can suggest final diagnoses

  • NIH/NCIsponsored research grants, including our own (R01 CA140560; R01 CA172343) have produced uniquely well-characterized biopsy images and methodology for finding regions of interest (ROIs),[2] segmenting them into histopathologic tissue types,[3] and using this segmentation as input to classifiers that suggest diagnoses.[4]. These methods are being converted into a unified Python software package, Machine Learning Package for Cancer Diagnosis (MLCD), with the corresponding modules available through the GitHub repository under the Massachusetts Institute of Technology license, to be distributed later with the Pathology Image Informatics Platform (PIIP) system developed by Martel et al.[5]

Read more

Summary

Introduction

Our CNN model was trained on size 384 × 384 patches from 58 ROIs fully annotated by an experienced pathologist Applying this trained CNN classifier on any given ROI yields a labeled image where the tissue types are represented by different integer labels and can be visualized in different colors: 0, Background (white); 2, Benign Epithelium (magenta); 3, Malignant Epithelium (blue); 4, Normal Stroma (pink); 5, Desmoplastic Stroma (violet); 6, Secretion (green); 7, Blood (yellow); 8, Necrosis (red). Benign epi Malign. epi Normal str Desmo. str Secretion Necrosis Blood ROI detection Correct Incorrect

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.