Med-ImageTools: An open-source Python package for robust data processing pipelines and curating medical imaging data

Sejin Kim,Michal Kazmierski,Kevin Qu,Jacob Peoples,Minoru Nakano,Vishwesh Ramanathan,Joseph Marsilla,Mattea Welch,Amber Simpson,Benjamin Haibe-Kains

doi:10.12688/f1000research.127142.2

Abstract

Background Machine learning and AI promise to revolutionize the way we leverage medical imaging data for improving care but require large datasets to train computational models that can be implemented in clinical practice. However, processing large and complex medical imaging datasets remains an open challenge. Methods To address this issue, we developed Med-ImageTools, a new Python open-source software package to automate data curation and processing while allowing researchers to share their data processing configurations more easily, lowering the barrier for other researchers to reproduce published works. Use cases We have demonstrated the efficiency of Med-ImageTools across three different datasets, resulting in significantly reduced processing times. Conclusions The AutoPipeline feature will improve the accessibility of raw clinical datasets on public archives, such as the Cancer Imaging Archive (TCIA), the largest public repository of cancer imaging, allowing machine learning researchers to process analysis-ready formats without requiring deep domain knowledge.

Full Text