Abstract
This article aims to create a repository of 33,000 medical CT images and 33,000 diagnostic reports with international standards (HL7 HAPI FHIR, DICOM, SNOMED). This goal requires devising a data ingestion procedure that can be replicated by other provider institutions, guaranteeing data privacy by implementing a pseudo-anonymization algorithm at the source, and generating labels from annotations via NLP. Our approach involves hybrid on-premise/cloud deployment of PACS and FHIR services, including transfer services for anonymized data to populate the repository through a structured ingestion procedure. We used NLP over the diagnostic reports to generate annotations, which were then used to train ML algorithms for content-based similar exam recovery. We successfully implemented ALPACS and PROXIMITY 2.0, ingesting almost 19,000 thorax CT exams to date along with their corresponding reports.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have