Abstract

For AI researchers, access to a large and well-curated dataset is crucial. Working in the field of breast radiology, our aim was to develop a high-quality platform that can be used for evaluation of networks aiming to predict breast cancer risk, estimate mammographic sensitivity, and detect tumors. Our dataset, Cohort of Screen-Aged Women (CSAW), is a population-based cohort of all women 40 to 74 years of age invited to screening in the Stockholm region, Sweden, between 2008 and 2015. All women were invited to mammography screening every 18 to 24 months free of charge. Images were collected from the PACS of the three breast centers that completely cover the region. DICOM metadata were collected together with the images. Screening decisions and clinical outcome data were collected by linkage to the regional cancer center registers. Incident cancer cases, from one center, were pixel-level annotated by a radiologist. A separate subset for efficient evaluation of external networks was defined for the uptake area of one center. The collection and use of the dataset for the purpose of AI research has been approved by the Ethical Review Board. CSAW included 499,807 women invited to screening between 2008 and 2015 with a total of 1,182,733 completed screening examinations. Around 2 million mammography images have currently been collected, including all images for women who developed breast cancer. There were 10,582 women diagnosed with breast cancer; for 8463, it was their first breast cancer. Clinical data include biopsy-verified breast cancer diagnoses, histological origin, tumor size, lymph node status, Elston grade, and receptor status. One thousand eight hundred ninety-one images of 898 women had tumors pixel level annotated including any tumor signs in the prior negative screening mammogram. Our dataset has already been used for evaluation by several research groups. We have defined a high-volume platform for training and evaluation of deep neural networks in the domain of mammographic imaging.

Highlights

  • Developing deep neural networks in the field of radiology is the focus of many research groups [1, 2]

  • 499,807 women were included in the Cohort of Screen-Aged Women (CSAW) cohort on the basis of a total of 1,688,216 invitations to screening between 2008 and 2015

  • There were 8463 women diagnosed with their first incident breast cancer (Table 1)

Read more

Summary

Introduction

Developing deep neural networks in the field of radiology is the focus of many research groups [1, 2]. Having a refined network architecture is important, while access to a large and well-curated dataset is crucial. There are a few large datasets available for deep learning research. NIH has released a data set of 100,000 chest X-rays from 30,000 patients [3]. The Stanford dataset CheXpert features 224,316 chest X-rays and radiology reports from 65,240 patients [4]. A group from the Geisinger Health system in the USA has curated a dataset of 40,367 3D head CT studies and trained a deep learning system for detecting brain hemorrhage [5]. There are a few publicly available mammographic datasets, such as Mammographic Image Analysis Society Minimammographic Database (mini-MIAS) and Digital Database for Screening Mammography (DDSM)

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.