Abstract

There are various efforts in de-identifying patient's radiation oncology data for their uses in the advancement of research in medicine. Though the task of de-identification needs to be defined in the context of research goals and objectives, existing systems lack the flexibility of modeling data and normalization of names of attributes for accomplishing them. In this work, we describe a de-identification process of radiation and clinical oncology data, which is guided by a data model and a schema of dynamically capturing domain ontology and normalization of terminologies, defined in tune with the research goals in this area. The radiological images are obtained in DICOM format. It consists of diagnostic, radiation therapy (RT) treatment planning, RT verification, and RT response images. During the DICOM de-identification, a few crucial pieces of information are taken about the dataset. The proposed model is generic in organizing information modeling in sync with the de-identification of a patient's clinical information. The treatment and clinical data are provided in the comma-separated values (CSV) format, which follows a predefined data structure. The de-identified data is harmonized throughout the entire process. We have presented four specific case studies on four different types of cancers, namely glioblastoma multiforme, head-neck, breast, and lung. We also present experimental validation on a few patients' data in these four areas. A few aspects are taken care of during de-identification, such as preservation of longitudinal date changes (LDC), incremental de-identification, referential data integrity between the clinical and image data, de-identified data harmonization, and transformation of the data to an underlined database schema.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call