Abstract 6579: Accelerating de-identification of images with cloud services to support data sharing in cancer research

Benjamin P Kopchick,Laura K Opsahl-Ong,Keyvan Farahani,Scott Gustafson,Bhavani S Singh,Michael W Rutherford,Qinyan Pan,David A Clunie,Fred W Prior,Juergen A Klenk,Ulrike Wagner

doi:10.1158/1538-7445.am2023-6579

Abstract

Abstract Purpose: De-identification of cancer imaging data is vitally important for data sharing and the advancement of research, however it is a time consuming and complex process that limits access to new cancer data sets such as those shared through NCI's Imaging Data Commons (IDC), built on the Google Cloud Platform (GCP). Our research demonstrates how this process can be automated using GCP-native services. Methods: We configured the Medical Image De-Identification (MIDI) pipeline to automate de-identification of cancer imaging data. De-identification is performed using an alpha release of GCP’s Healthcare API which was configured to scrub all Protected Health Information (PHI) from both Digital Imaging and Communications in Medicine (DICOM) headers and burnt-in text in pixel data. A dataset containing 216 patients and 23,921 images was prepared to test the de-identification algorithm by placing synthetic PHI in both DICOM headers and pixel data. The synthetic data matched real data seen during curation at The Cancer Imaging Archive (TCIA) and included data difficult for an algorithm to detect. Accuracy of the MIDI pipeline was measured against TCIA’s standard tools and procedures for de-identification. Measures included correct detection of all PHI data and correct action taken (e.g., remove, encrypt, or otherwise obscure). Throughput was also measured. Results: Throughput was measured at 22.0 images per second over 10 runs. The MIDI pipeline’s accuracy for DICOM headers was 98.7%, accurately detecting dates, addresses, phone numbers, unique identifiers, names, and other common PHI. The most common PHI failed to remove were special cases that included uncommon names or names with symbols, dates in string data types that were mistaken for other IDs, patient IDs, and abbreviated institution names. Private Creator data elements were consistently failed to be retained. These errors were due to options not currently available, and algorithms not trained on specific PHI, such as abbreviated institution names. UIDs were correctly replaced. PHI burnt-in the pixel data was successfully detected and removed, with one false positive. Conclusion: We demonstrate the current capability and performance of automated cancer image de-identification. Our results show that while full automation is within grasp, a semi-automated pipeline is now feasible. A human expert in the loop can be used for final verification. This will lead to a much-needed acceleration of image de-identification, to handle the rapidly growing volume of data and provide rapid timely access in support of cancer research. Future work will focus on including pre- and post-processing tools to aid the human expert in the loop, such as identifying and flagging questionable images for manual review. These tools will also be used to catch the errors mentioned in results. Citation Format: Benjamin P. Kopchick, Laura K. Opsahl-Ong, Qinyan Pan, Michael W. Rutherford, Ulrike Wagner, Bhavani S. Singh, Scott Gustafson, Fred W. Prior, David A. Clunie, Juergen A. Klenk, Keyvan Farahani. Accelerating de-identification of images with cloud services to support data sharing in cancer research. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 6579.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract 6579: Accelerating de-identification of images with cloud services to support data sharing in cancer research

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Journal: Cancer Research	Publication Date: Apr 4, 2023
Citations: 1

Similar Papers

Radiomics Prediction of Radiation Treatment Outcomes in Oropharyngeal Cancer: A Clinical and Image Repository in Concert with the Cancer Imaging Archive (TCIA)
H Elhalawani ... C.D Fuller
International Journal of Radiation Oncology*Biology*Physics | VOL. 102
H Elhalawani, et. al.H Elhalawani ... C.D Fuller
20 Oct 2018
International Journal of Radiation Oncology*Biology*Physics | VOL. 102

A DICOM dataset for evaluation of medical image de-identification
Michael Rutherford ... Seong K Mun
Scientific Data | VOL. 8
Michael Rutherford, et. al.Michael Rutherford ... Seong K Mun
16 Jul 2021
Scientific Data | VOL. 8

Medical image de-identification using cloud services
Benjamin Kopchick ... Brian J Park
-
Benjamin Kopchick, et. al.Benjamin Kopchick ... Brian J Park
04 Apr 2022
04 Apr 2022

NCI Imaging Data Commons
A Fedorov ... R Kikinis
International Journal of Radiation Oncology*Biology*Physics | VOL. 111
A Fedorov, et. al.A Fedorov ... R Kikinis
22 Oct 2021
International Journal of Radiation Oncology*Biology*Physics | VOL. 111

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract 6579: Accelerating de-identification of images with cloud services to support data sharing in cancer research

Abstract

Talk to us

Similar Papers

More From: Cancer Research