Computer-aided diagnosis (CAD) systems on breast ultrasound (BUS) aim to increase the efficiency and effectiveness of breast screening, helping specialists to detect and classify breast lesions. CAD system development requires a set of annotated images, including lesion segmentation, biopsy results to specify benign and malignant cases, and BI-RADS categories to indicate the likelihood of malignancy. Besides, standardized partitions of training, validation, and test sets promote reproducibility and fair comparisons between different approaches. Thus, we present a publicly available BUS dataset whose novelty is the substantial increment of cases with the above-mentioned annotations and the inclusion of standardized partitions to objectively assess and compare CADsystems. The BUS dataset comprises1875 anonymized images from1064 female patients acquired via four ultrasound scanners during systematic studies at the National Institute of Cancer (Rio de Janeiro, Brazil). The dataset includes biopsy-proven tumors divided into722 benign and342 malignant cases. Besides, a senior ultrasonographer performed a BI-RADS assessment in categories2 to5. Additionally, the ultrasonographer manually outlined the breast lesions to obtain ground truth segmentations. Furthermore, 5- and 10-fold cross-validation partitions are provided to standardize the training and test sets to evaluate and reproduce CAD systems. Finally, to validate the utility of the BUS dataset, an evaluation framework is implemented to assess the performance of deep neural networks for segmenting and classifying breastlesions. The BUS dataset is publicly available for academic and research purposes through an open-access repository under the name BUS-BRA: A Breast Ultrasound Dataset for Assessing CAD Systems. BUS images and reference segmentations are saved in Portable Network Graphic (PNG) format files, and the dataset information is stored in separate Comma-Separated Value (CSV)files. The BUS-BRA dataset can be used to develop and assess artificial intelligence-based lesion detection and segmentation methods, and the classification of BUS images into pathological classes and BI-RADS categories. Other potential applications include developing image processing methods like despeckle filtering and contrast enhancement methods to improve image quality and feature engineering for imagedescription.
Read full abstract