Abstract
Objective: To evaluate the value of machine learning (ML) models based on biparametric magnetic resonance imaging (bpMRI) for diagnosis of prostate cancer (PCa) and clinically significant prostate cancer (csPCa). Methods: A total of 1 368 patients, aged from 30 to 92 (69.4±8.2) years, from 3 tertiary medical centers in Jiangsu Province were retrospectively collected from May 2015 to December 2020, including 412 cases of csPCa, 242 cases of clinically insignificant prostate cancer (ciPCa) and 714 cases of benign prostate lesions. The data of center 1 and center 2 were randomly divided into training cohort and internal testing cohort at a ratio of 7∶3 by random number sampling without replacement using Python Random package, and the data of center 3 were used as the independent external testing cohort. The training cohort includs 243 cases of csPCa, 135 cases of ciPCa and 384 cases of benign lesions, the internal testing cohort includs 104 cases of csPCa, 58 cases of ciPCa and 165 cases of benign lesions, and the external testing cohort includs 65 cases of csPCa, 49 cases of ciPCa and 165 cases of benign lesions. The radiomics features were extracted on T2-weighted imaging, diffusion-weighted imaging and apparent diffusion coefficient map, and optimal radiomics features were selected by using Pearson correlation coefficient method and analysis of variance. The ML models were built using two ML algorithms, including support vector machine and random forest (RF) and were further tested in the internal testing cohort and external testing cohort. Finally, the PI-RADS scores evaluated by the radiologists were adjusted by the ML models which had superior diagnostic performance, namely adjusted PI-RADS. The receiver operating characteristic (ROC) curves were used to evaluate the diagnostic performance of the ML models and PI-RADS. DeLong test was used to compare the areas under curve (AUC) of models with those of PI-RADS. Results: For PCa diagnosis, in internal testing cohort, the AUC of ML model using RF algorithm and PI-RADS were 0.869 (95%CI: 0.830-0.908) and 0.874 (95%CI: 0.836-0.913), respectively, and the difference between the model and PI-RADS did not reach to the statistical significance (P=0.793). In the external testing cohort, the AUC of model and PI-RADS were 0.845 (95%CI: 0.794-0.897) and 0.915 (95%CI: 0.880-0.951), respectively, and the difference was statistically significant (P=0.01). For csPCa diagnosis, the AUC of ML model using RF algorithm and PI-RADS were 0.874 (95%CI: 0.834-0.914) and 0.892 (95%CI: 0.857-0.927), respectively, in internal testing cohort, and the difference between the model and PI-RADS was not statistically significant (P=0.341). In the external testing cohort, the AUC of model and PI-RADS were 0.876 (95%CI: 0.831-0.920) and 0.884 (95%CI: 0.841-0.926), respectively, and the difference between the model and PI-RADS was not statistically significant (P=0.704). When PI-RADS assessment was adjusted with the assistance of ML models, the specificities increased from 63.0% to 80.0% in the internal testing cohort and from 92.7% to 93.3% in the external test group in diagnosing PCa. In diagnosing csPCa, the specificities increased from 52.5% to 72.6% in the internal testing cohort and from 75.2% to 79.9% in the external testing cohort. Conclusions: The ML models based on bpMRI showed comparable diagnostic performance to PI-RADS assessed by senior radiologists and achieved good generalization ability in both diagnosing PCa and csPCa. The specificities of the PI-RADS were improved by ML models.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have