When mining image data from PACs or clinical trials or processing large volumes of data without curation, the relevant scans must be identified among irrelevant or redundant data. Only images acquired with appropriate technical factors, patient positioning, and physiological conditions may be applicable to a particular image processing or machine learning task. Automatic labeling is important to make big data mining practical by replacing conventional manual review of every single-image series. Digital imaging and communications in medicine headers usually do not provide all the necessary labels and are sometimes incorrect. We propose an image-based high throughput labeling pipeline using deep learning, aimed at identifying scan direction, scan posture, lung coverage, contrast usage, and breath-hold types. They were posed as different classification problems and some of them involved further segmentation and identification of anatomic landmarks. Images of different view planes were used depending on the specific classification problem. All of our models achieved accuracy on test set across different tasks using a research database from multicenter clinical trials.
Read full abstract