This study proposed an integrated dataset-preparation system for ML-based medical image diagnosis, offering high clinical applicability in various modalities and diagnostic purposes. With the proliferation of ML-based computer-aided diagnosis using medical images, massive datasets should be prepared. Lacking of a standard procedure, dataset-preparation may become ineffective. Besides, on-demand procedures are locked to a single image-modality and purpose. For these reasons, we introduced a dataset-preparation system applicable for a variety of modalities and purposes. The system consisted of a common part including incremental anonymization and cross annotation for preparing anonymized unprocessed data, followed by modality/subject-dependent parts for subsequent processes. The incremental anonymization was carried out in batch after the image acquisition. Cross annotation enabled collaborative medical specialists to co-generate annotation objects. For quick observation of dataset, thumbnail images were created. With anonymized images, preprocessing was accomplished by complementing manual operations with automatic operations. Finally, feature extraction was automatically performed to obtain data representation. Experimental results on two demonstrative systems dedicated to esthetic outcome evaluation of breast reconstruction surgery from 3D breast images and tumor detection from breast MRI images were provided. The proposed system successfully prepared the 3D breast-mesh closures and their geometric features from 3D breast images, as well as radiomics and likelihood features from breast MRI images. The system also enabled effective voxel-by-voxel prediction of tumor region from breast MRI images using random-forest and k-nearest-neighbors algorithms. The results confirmed the efficiency of the system in preparing dataset with high clinical applicability regardless of the image modality and diagnostic purpose.
Read full abstract