Abstract Study question We investigated whether machine learning models using MRI data can predict the subtypes and tissue composition of uterine leiomyomas. Summary answer Our machine learning models using MRI data were able to predict the subtypes and tissue composition of uterine leiomyomas with high accuracy. What is known already Recently, somatic mutations in the Mediator complex subunit 12 (MED12) gene were found to be a biomarker of uterine leiomyomas, which is detected in about 70% of uterine leiomyomas. Uterine leiomyomas are classified into two subtypes with or without the MED12 mutation. These subtypes differ in the ratio of smooth muscle cells and fibroblasts and in the amount of collagen fibers. In addition, sensitivities to female hormones differ between smooth muscle cells and fibroblasts. Thus, the effect of therapeutic drugs (GnRH analogs and selective progesterone receptor modulators) may differ depending on the subtypes and tissue composition of uterine leiomyomas. Study design, size, duration We analyzed 90 uterine leiomyoma nodules (MED12 mutation-positive and negative = 62 and 28) obtained from 51 women who underwent surgery at our hospital between 2020 and 2022. Seventy-one uterine leiomyomas (MED12 mutation-positive and negative = 49 and 22) were assigned to the primary dataset to establish the prediction models. Nineteen uterine leiomyomas (MED12 mutation-positive and negative = 13 and 6) were assigned to the test dataset to validate the prediction model utility. Participants/materials, setting, methods For each leiomyoma, the tumor signal intensity was quantified by five MRI sequences (T2WI, ADC, T1map, T2*BOLD, MTC) for evaluating the collagen amount. After surgery, genotyping of MED12 was examined and Trichrome staining was performed to quantify the collagen amount. Using these results, we established the prediction models based on machine learning by applying support vector classification and logistic regression for subtype prediction, and support vector regression and Ridge regression for tissue composition prediction. Main results and the role of chance The signal intensity of all five MRI sequences differed significantly between the subtypes. The cross-validation within the primary dataset showed that support vector classification and logistic regression models using five MRI sequences and MED12 genotyping data were highly predictive of the subtypes (AUC: 0.984 and 0.995, respectively). The validation using the test dataset showed that both models were able to predict the subtypes for all uterine leiomyomas (AUC: 1.000, both). This result showed higher accuracy than each MRI sequence’s cut-off value alone. On the other hand, four MRI sequence values (other than ADC) showed a significant correlation with the collagen amount. However, to improve accuracy, we added the ADC data and the result of the subtype predicted by the preceding models to the predictors for the collagen amount. Support vector regression and Ridge regression models using five MRI sequences and Trichrome staining data were shown to be highly predictive of the collagen amount in cross-validation within the primary dataset (R2: 0.570 and 0.525, respectively) and in validation using the test dataset (R2: 0.648 and 0.675, respectively). Moreover, the prediction models using only T2WI and ADC, which are taken in general clinical practice, were as accurate as using five MRI sequences. Limitations, reasons for caution In this study, we used limited data by using one specific MRI machine in a single center to establish the prediction models. We have not verified that similar results would be obtained under different conditions. Therefore, further training and evaluation in larger cohorts are required for clinical use. Wider implications of the findings The prediction models established in this study may be useful in clinical support such as predicting the effect of drug therapy and selecting therapeutic drugs, such as GnRH analogs and selective progesterone receptor modulators. Their utilities will be confirmed by validation using our prediction models in studies involving drug therapy. Trial registration number not applicable