Abstract

There is an unmet need for integrating quantitative imaging biomarkers into current risk stratification tools. To explore the correlation between radiomics features – alone or in combination with clinical prognosticators- and tumor outcome, we retrieved clinical meta-data and matched baseline contrast-enhanced computed tomography (CECT) scans from a single institution, institutional review board-approved cohort of 495 oropharyngeal cancer (OPC) patients. We opted to publicly share this large curated data set and subsequent radiomics analytical outcome via The Cancer Imaging Archive (TCIA) to serve as a resource for optimized standardization in the radiomics field. Diagnostic CECT images were acquired at our institution between 2005 and 2012 for 495 OPC patients (prior to any active intervention) in Digital Imaging and Communications in Medicine (DICOM) format . Expert radiation oncologists manually segmented primary and nodal disease gross volumes (GTVp & GTVn). Structure sets were named per the American Association of Physicists in Medicine (AAPM) TG-263 recommendations, then retrieved in DICOM RTSTRUCT format. Matched patient, disease, treatment and outcomes data were obtained. Radiomics analysis was performed using an open-source institutionally-developed software that runs on a computer algorithm. Protected health information (PHI) was removed from all DICOM elements in compliance with the DICOM standards committee P.S. 3.15 Annex E, Attribute Confidentiality Profile using the Radiological Society of North America clinical trial processor (RSNA CTP). TCIA’s curation process ensures the DICOM data is free of PHI and all linkages between CT and structure sets are correct. Following curation the data was archived in TCIA as a permanent open-access collection named “HNSCC”. Anonymized data for 495 OPC patients will be made publicly available from TCIA as downloadable DICOM files (N=**), clinical data, and processed outputs, i.e. tumor segmentations and extracted radiomics features. All data attributes can be cross-referenced via the same anonymized subject IDs. We prepared a data dictionary that specifies clinical data attributes, definitions and possible variables per North American Association of Central Cancer Registries (NAACCR) guidelines. Large-scale data curation-anonymization-transfer workflows, as well as advanced image registration algorithms and common ontology data dictionaries, are unmet needs for joint machine-learning/radiomics research projects. If these resources are paired with large, curated data sets, like those provided via open-access mega-data repositories, like TCIA, the potential benefits include identification of imaging-derived radiomics signatures associated with treatment outcomes and normal-tissue complications across various head and neck epidemiologic cohorts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call