Abstract

With the increasing size of datasets used in medical imaging research, the need for automated data curation is arising. One important data curation task is the structured organization of a dataset for preserving integrity and ensuring reusability. Therefore, we investigated whether this data organization step can be automated. To this end, we designed a convolutional neural network (CNN) that automatically recognizes eight different brain magnetic resonance imaging (MRI) scan types based on visual appearance. Thus, our method is unaffected by inconsistent or missing scan metadata. It can recognize pre-contrast T1-weighted (T1w),post-contrast T1-weighted (T1wC), T2-weighted (T2w), proton density-weighted (PDw) and derived maps (e.g. apparent diffusion coefficient and cerebral blood flow). In a first experiment,we used scans of subjects with brain tumors: 11065 scans of 719 subjects for training, and 2369 scans of 192 subjects for testing. The CNN achieved an overall accuracy of 98.7%. In a second experiment, we trained the CNN on all 13434 scans from the first experiment and tested it on 7227 scans of 1318 Alzheimer’s subjects. Here, the CNN achieved an overall accuracy of 98.5%. In conclusion, our method can accurately predict scan type, and can quickly and automatically sort a brain MRI dataset virtually without the need for manual verification. In this way, our method can assist with properly organizing a dataset, which maximizes the shareability and integrity of the data.

Highlights

  • With the rising popularity of machine learning, deep learning, and automatic pipelines in the medical imaging field, the demand for large datasets is increasing

  • The highest per-class accuracy was achieved for the proton density-weighted (PDw) and perfusion-weighted dynamic susceptibility contrast (PWI-DSC) scans (100.0% for both), whereas the T2w-fluid-attenuated inversion recovery (FLAIR) scans had the lowest accuracy (93.0%)

  • It makes sense that the convolutional neural network (CNN) focuses on the cerebral spinal fluid (CSF), both in the ventricles and at the edges of the brain, because their visual appearance is very characteristic of the scan type

Read more

Summary

Introduction

With the rising popularity of machine learning, deep learning, and automatic pipelines in the medical imaging field, the demand for large datasets is increasing. To satisfy this hunger for data, the amount of imaging data collected at healthcare institutes keeps growing, as is the amount of data. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). That is shared in public repositories (Greenspan et al 2016; Lundervold and Lundervold 2019) This increase in available data means that proper data curation, the management of data throughout its life cycle, is needed to keep the data manageable and workable (Prevedello et al 2019; van Ooijen 2019). Organizing the dataset maximizes the shareability and preserves the full integrity of the dataset, ensuring repeatability of an experiment and reuse of the dataset in other experiments

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.