Adenosis is a benign breast condition whose lesions can mimic breast carcinoma and is evaluated for malignancy with the Breast Imaging–Reporting and Data System (BI-RADS). We construct and validate the performance of modality-specific enhancement (MSE)-Breast Net based on multimodal ultrasound images and compare it to the BI-RADS in differentiating adenosis from breast cancer.A total of 179 patients with breast carcinoma and 229 patients with adenosis were included in this retrospective, two-institution study, then divided into a training cohort (institution I, n = 292) and a validation cohort (institution II, n = 116). In the training cohort, the final model had a significantly greater AUC (0.82; P < 0.05) than B-mode–based model (0.69, 95% CI [0.49–0.90]). In the validation cohort, the AUC of the final model was 0.81, greater than that of the BI-RADS (0.75, P < 0.05). The multimodal model outperformed the individual and bimodal models, reaching a significantly greater AUC of 0.87 (95% CI = 0.69–1.0) (P < 0.05).MSE-Breast Net, based on multimodal ultrasound images, exhibited better diagnostic performance than the BI-RADS in differentiating adenosis from breast cancer and may contribute to clinical diagnosis and treatment.