Abstract Study question Can we develop predictive, practical biomarkers for premature ovarian insufficiency (POI) from normal ovaries and human blood, tested and validated by machine-learning (ML) algorithm? Summary answer Applying random forest and XGBoost algorithms, we pinpointed 60 significant genes from ovarian tissues; transcriptome analysis identified RPM1 as the most significant biomarker. What is known already Accelerated ovarian aging has been suggested as one of the possible underlying mechanisms of POI; previous literature has focused on identifying its cause in genetic realm, commonly adopting first-generation sequencing (Sanger) and next-generation sequencing (NGS). However, such powerful tools are still limited in accurately depicting the complete transcript information of ovarian tissue. Moreover, theoretical value of such disease-predictive gene candidates needs to be validated using human tissues and blood, and finally, underlying mechanisms should be confirmed with ideal animal model and cell line. Study design, size, duration To overcome the existing limitations and maximize clinical relevance of disease-predictive gene analysis and theoretical biomarker exploration, the current study incorporated the transcriptome expression data from both young and old ovaries obtained from the Genotype-Tissue Expression (GTEx) database. Using supervised machine learning techniques, potential marker genes were identified, and a meaningful predictive index was subsequently developed using human plasma samples of control and POI women with and without pregnancy history. Participants/materials, setting, methods The performance evaluation of two ML models were achieved using the area under receiver operating characteristic curve (AUROC), accuracy, kappa, and F1 score, among various age groups. The expression profiles of final candidate marker genes were analyzed within the GTEx datasets, in relation to their correlation pattern with age. Finally, the potential transcriptome biomarkers were quantitatively evaluated using human blood samples from 92 controls and 108 POI women visiting a tertiary center in 2023. Main results and the role of chance We found 41 significant genes using the random forest algorithm and 19 significant genes using the XGboost algorithm. We then tested the ML-characterized 7 final candidate biomarkers in human blood samples (all p < 0.05). The AUROC and interventional predictive index results demonstrated the potential of RPM1 as a predictive marker with AUROC of 0.784, comparable to the AUROC of anti-mullerian hormone (AMH) as 0.659 and inhibin B (INHB) as 0.646. Limitations, reasons for caution To avoid potential biases, RPMs should be tested in a larger, multinational cohort with different genetic and geographic backgrounds, possibly incorporating Mendelian randomization method to estimate the causal effect of RPMs on POI. Finally, to ultimately confirm their therapeutic potential, preclinical studies should be considered using RPM recombinant proteins. Wider implications of the findings Our results proposed that predictive biomarkers for POI could be associated with premature, accelerated ovarian aging of the disease mechanism, and these biomarkers can be developed as evaluation metrics for the efficacy, progress, and prognosis of interventions and treatment modalities assigned to POI women seeking fertility. Trial registration number 202302240001
Read full abstract