National Cancer Institute (NCI) Cancer Research Data Commons (CRDC) aims to establish a cloud-based data science infrastructure. Imaging Data Commons (IDC) is a component of CRDC supported by the Cancer Moonshot™, which aims to enable access and exploration of de-identified imaging data, and to support integrated analyses with non-imaging data. IDC will interoperate with other components of CRDC, which include repositories of other types of data, such as genomics and proteomics repositories, and computational resources to perform analysis of the data. IDC builds on the strengths of the established efforts such as The Cancer Imaging Archive (TCIA) to collect and share FAIR (Findable Accessible Interoperable Reusable) imaging data.IDC uses a combination of commercially available tools and capabilities provided by Google Cloud Platform (GCP) together with a range of open-source components. While the initial focus is to support clinical radiology and radiotherapy data, IDC aims to provide similar capabilities for brightfield microscopy, multi-channel immunofluorescence and other imaging modalities. Equally important is the ability to support the results of imaging data analysis, such as annotations of regions of interest in the images or various descriptors of image findings. The IDC search portal provides an interface for exploring the data, defining cohorts, and summarizing attributes of the cohort. Images can be viewed in the integrated browser-based viewer, which uses DICOMweb to access the IDC data. IDC data is public and contains no Protected Health Information (PHI). As CDRC grows, imaging datasets will be increasingly cross-linked to genomic, proteomic, and clinical data about the subjects.The pilot of IDC was released in October 2020, including 28 collections of the TCIA: radiology images related to The Cancer Genome Atlas (TCGA) project, and several collections prioritized to establish the capabilities of IDC in handling image-derived data. DICOM and collection-level metadata is available from the BigQuery tables, and does not require a project configured with billing. The IDC portal is available at https://portal.imagingdatacommons.cancer.gov, and integrates a customized web viewer that supports visualization of both the images and image annotations (specifically, visualization of DICOM Segmentation and Radiotherapy Structure Set is supported, including multiplanar reformatting). IDC also provides documentation and a user forum.The IDC pilot available to the cancer research community explores the promise of cloud-hosted public imaging collections co-located with the compute resources and a growing number of tools to support data analysis. Production release of IDC is planned for Fall 2021, and will include all of the public TCIA collections, including those that contain imaging and annotation data from radiotherapy studies and clinical trials.
Read full abstract