After the introduction of the Ovarian-Adnexal Reporting and Data System (O-RADS) for magnetic resonance imaging (MRI), several studies with diverse characteristics have been published to assess its diagnostic performance. This systematic review and meta-analysis aimed to assess the diagnostic performance of O-RADS MRI scoring for adnexal masses, accounting for the risk of selection bias. The PubMed, Scopus, Web of Science, and Cochrane databases were searched for eligible studies. Borderline or malignant lesions were considered malignant. All O-RADS MRI scores ≥4 were considered positive. The quality of the studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. The pooled sensitivity, specificity, and likelihood ratio (LR) values were calculated, considering the risk of selection bias. Fifteen eligible studies were found, and five of them had a high risk of selection bias. Between-study heterogeneity was low-to-moderate for sensitivity but substantial for specificity (I2 values were 35.5% and 64.7%, respectively). The pooled sensitivity was significantly lower in the studies with a low risk of bias compared with those with a high risk of bias (93.0% and 97.5%, respectively; P = 0.043), whereas the pooled specificity was not different (90.4% for the overall population). The negative and positive LRs were 0.08 [95% confidence interval (CI) 0.05–0.11] and 10.0 (95% CI 7.7–12.9), respectively, for the studies with low risk of bias and 0.03 (95% CI 0.01–0.10) and 10.3 (95% CI 3.8–28.3), respectively, for those with high risk of bias. The overall diagnostic performance of the O-RADS system is very high, particularly for ruling out borderline/malignant lesions, but with a moderate ruling-in potential. Studies with a high risk of selection bias lead to an overestimation of sensitivity. The O-RADS system demonstrates considerable diagnostic performance, particularly in ruling out borderline or malignant lesions, and should routinely be used in practice. The high between-study heterogeneity observed for specificity suggests the need for improvement in the consistent characterization of the benign lesions to reduce false positive rates.