Comparative Analysis of PCOS Classification Using Random Forest: Integration of Mutual Information, SMOTE-Tomek, and Outlier Handling

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Polycystic Ovary Syndrome (PCOS) is a hormonal disorder affecting women of reproductive age, with a global prevalence rate of 8–13%. However, approximately 70% of cases remain undiagnosed. This study aimed to develop and compare eight Random Forest classification models for PCOS detection using a publicly available Kaggle dataset. The methodology incorporated three key preprocessing techniques: outlier handling using the Interquartile Range (IQR) method, feature selection through Mutual Information, and class imbalance via SMOTE-Tomek. The results revealed that the best-performing model, which applied outlier removal and SMOTE without feature selection, achieved an accuracy of 94.11%. This result significantly outperformed the baseline Random Forest model, which achieved an accuracy of 87.27% without the application of any preprocessing techniques, such as outlier removal, SMOTE, or feature selection. Moreover, the model utilizing only SMOTE for class balancing achieved an accuracy of 93.84%, underscoring the importance of addressing class imbalance in enhancing classification performance. Notably, feature selection did not consistently improve accuracy, as Random Forest inherently handles feature redundancy, capturing complex feature interactions. These findings highlight the importance of tailored preprocessing strategies, particularly outlier handling and class balancing, for optimizing medical data classification. Future research should explore clinically informed feature selection techniques and assess the generalizability of these findings across diverse datasets to enhance the clinical relevance of PCOS detection models.

Similar Papers
  • Research Article
  • 10.3233/jifs-219402
Hybrid Machine Learning Approach for Early Diagnosis of Polycystic Ovary Syndrome with Stable Features
  • Apr 25, 2024
  • Journal of Intelligent & Fuzzy Systems
  • S Reka + 3 more

Polycystic Ovary Syndrome (PCOS) is a hormonal condition that typically affects female during the time of their reproduction. It is identified by the disruptions in hormonal balance, particularly an increase in levels of androgen (male hormone) in the female body. PCOS can lead to various symptoms and health complications including irregular menstrual cycles, ovarian cysts, fertility issues, insulin resistance, weight gain, acne, and excess hair growth. The real-world PCOS detection is a challenging task whilst PCOS specific cause is unknown and its symptoms are unclear. Thus, accurate and timely diagnosis of PCOS is crucial for effective management and prevention of long-term complications. In such cases, Machine learning based PCOS prediction model support diagnostic process, address potential errors and time constraints. Machine learning algorithms can analyze large set of patient data, including medical history, hormonal profiles, and imaging results, to assist in the diagnosis of PCOS. In particular, the performance of data analysis chore and prediction model is improved by ensemble feature selection strategies. These methods concentrate on selecting a subset of pertinent features from a broader range of features. The unstable nature of the outcome of feature selection algorithm is a frequent issue in practical applications, when it is applied multiple times on similar dataset or with slight modifications in the data. Thus, evaluating the robustness of feature selection algorithm is most important. To address these issues and quantify the robustness, this study uses Jenson-Shannon divergence, an information theoretic approach with ensemble feature selection method to handle the various findings, such as complete ranking, half ranking and top-k lists (without ranking). Furthermore, this article proposes a hybrid machine learning classifier with SMOTE – SVM for the prompt detection of PCOS and the performance of the model is compared with a number of other individual classifiers including KNN (K-Nearest Neighbour), Support Vector Machine (SVM), AdaBoost, LR –Logistic Regression, NB –Nave Bayes, RF –Random Forest, Decision Tree. The proposed SWISS-AdaBoost classifier surpassed other models with 97.81% of accuracy and AUC of 99.08%.

  • Research Article
  • 10.55041/ijsrem51012
AI – Powered PCOD Detection Platform
  • Jun 25, 2025
  • INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Prasad Vadkar

Polycystic Ovary Syndrome (PCOS), also referred to as Polycystic Ovarian Disease (PCOD), is one of the most prevalent endocrine disorders affecting women of reproductive age worldwide. It is a leading cause of anovulatory infertility and is characterized by hormonal imbalances that result in symptoms such as irregular menstrual cycles, excessive weight gain, acne, hair loss, and skin darkening. Despite its high prevalence, early-stage detection and accurate prediction of PCOS remain challenging due to limitations in existing diagnostic methods and treatment strategies. This research aims to address these challenges by developing an advanced, computer-aided detection system utilizing machine learning (ML) and deep learning (DL) techniques. The system leverages ovary ultrasound (USG) images— one of the most reliable diagnostic modalities for PCOS—and incorporates a Convolutional Neural Network (CNN) for robust feature extraction. To enhance classification performance, a stacking ensemble model is implemented using a combination of traditional machine learning classifiers as base learners and bagging or boosting techniques as meta-learners. The CNN architecture is further strengthened through transfer learning and modern feature selection techniques such as I-SQUARE and CHI-square. The study involves training and evaluating the proposed model on a dataset comprising 4000 ovary USG images, sourced from a publicly available PCOS dataset on Kaggle by Parson Kottarathil. Additionally, five ML classifiers— Random Forest, Support Vector Machine (SVM), Logistic Regression, Gaussian Naïve Bayes, and K-Nearest Neighbors—were evaluated on a subset of the dataset containing 41 clinical and physiological features, with the top 30 features selected for classification. Experimental results indicate that the Random Forest Classifier outperforms other models in terms of accuracy and reliability. The proposed hybrid system significantly improves detection accuracy while reducing execution time, making it a promising solution for aiding healthcare professionals in the early diagnosis and management of PCOS. This research lays the foundation for intelligent and scalable PCOS detection systems that integrate clinical data and medical imaging, thereby advancing personalized and timely healthcare delivery for women suffering from this condition. Keywords: Polycystic Ovary Syndrome (PCOS), Machine Learning, Deep Learning, Convolutional Neural Network (CNN), Medical Imaging, Ultrasound, Classification, Data Mining, Healthcare, Prediction System, Early Diagnosis

  • Research Article
  • Cite Count Icon 89
  • 10.1016/j.fertnstert.2010.02.015
Variation in metabolic and cardiovascular risk in women with different polycystic ovary syndrome phenotypes
  • Mar 24, 2010
  • Fertility and Sterility
  • Denusa Wiltgen + 1 more

Variation in metabolic and cardiovascular risk in women with different polycystic ovary syndrome phenotypes

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 55
  • 10.3390/diagnostics13081506
Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
  • Apr 21, 2023
  • Diagnostics
  • Hela Elmannai + 6 more

Polycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effective and early PCOS diagnosis will help the healthcare systems to reduce the disease’s problems and complications. Machine learning (ML) and ensemble learning have recently shown promising results in medical diagnostics. The main goal of our research is to provide model explanations to ensure efficiency, effectiveness, and trust in the developed model through local and global explanations. Feature selection methods with different types of ML models (logistic regression (LR), random forest (RF), decision tree (DT), naive Bayes (NB), support vector machine (SVM), k-nearest neighbor (KNN), xgboost, and Adaboost algorithm to get optimal feature selection and best model. Stacking ML models that combine the best base ML models with meta-learner are proposed to improve performance. Bayesian optimization is used to optimize ML models. Combining SMOTE (Synthetic Minority Oversampling Techniques) and ENN (Edited Nearest Neighbour) solves the class imbalance. The experimental results were made using a benchmark PCOS dataset with two ratios splitting 70:30 and 80:20. The result showed that the Stacking ML with REF feature selection recorded the highest accuracy at 100 compared to other models.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.engappai.2023.107400
Computational intelligence for early detection of infertility in women
  • Nov 9, 2023
  • Engineering Applications of Artificial Intelligence
  • Subha R + 3 more

Computational intelligence for early detection of infertility in women

  • Research Article
  • Cite Count Icon 6
  • 10.1007/s11517-023-02892-1
Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.
  • Aug 2, 2023
  • Medical & Biological Engineering & Computing
  • Akash Kishore + 4 more

Prediction of the stage of cancer plays an important role in planning the course of treatment and has been largely reliant on imaging tools which do not capture molecular events that cause cancer progression. Gene-expression data-based analyses are able to identify these events, allowing RNA-sequence and microarray cancer data to be used for cancer analyses. Breast cancer is the most common cancer worldwide, and is classified into four stages - stages 1, 2, 3, and 4 [2]. While machine learning models have previously been explored to perform stage classification with limited success, multi-class stage classification has not had significant progress. There is a need for improved multi-class classification models, such as by investigating deep learning models. Gene-expression-based cancer data is characterised by the small size of available datasets, class imbalance, and high dimensionality. Class balancing methods must be applied to the dataset. Since all the genes are not necessary for stage prediction, retaining only the necessary genes can improve classification accuracy. The breast cancer samples are to be classified into 4 classes of stages 1 to 4. Invasive ductal carcinoma breast cancer samples are obtained from The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) datasets and combined. Two class balancing techniques are explored, synthetic minority oversampling technique (SMOTE) and SMOTE followed by random undersampling. A hybrid feature selection pipeline is proposed, with three pipelines explored involving combinations of filter and embedded feature selection methods: Pipeline 1 - minimum-redundancy maximum-relevancy (mRMR) and correlation feature selection (CFS), Pipeline 2 - mRMR, mutual information (MI) and CFS, and Pipeline 3 - mRMR and support vector machine-recursive feature elimination (SVM-RFE). The classification is done using deep learning models, namely deep neural network, convolutional neural network, recurrent neural network, a modified deep neural network, and an AutoKeras generated model. Classification performance post class-balancing and various feature selection techniques show marked improvement over classification prior to feature selection. The best multiclass classification was found to be by a deep neural network post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with a Cohen-Kappa score of 0.303 and a classification accuracy of 53.1%. For binary classification into early and late-stage cancer, the best performance is obtained by a modified deep neural network (DNN) post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with an accuracy of 81.0% and a Cohen-Kappa score (CKS) of 0.280. This pipeline also showed improved multiclass classification performance on neuroblastoma cancer data, with a best area under the receiver operating characteristic (auROC) curve score of 0.872, as compared to 0.71 obtained in previous work, an improvement of 22.81%. The results and analysis reveal that feature selection techniques play a vital role in gene-expression data-based classification, and the proposed hybrid feature selection pipeline improves classification performance. Multi-class classification is possible using deep learning models, though further improvement particularly in late-stage classification is necessary and should be explored further.

  • Research Article
  • Cite Count Icon 4
  • 10.2147/jir.s438838
Establishment and Analysis of an Artificial Neural Network Model for Early Detection of Polycystic Ovary Syndrome Using Machine Learning Techniques.
  • Nov 1, 2023
  • Journal of Inflammation Research
  • Yumi Wu + 4 more

To identify novel gene combinations and to develop an early diagnostic model for Polycystic Ovary Syndrome (PCOS) through the integration of artificial neural networks (ANN) and random forest (RF) methods. We retrieved and processed gene expression datasets for PCOS from the Gene Expression Omnibus (GEO) database. Differential expression analysis of genes (DEGs) within the training set was performed using the "limma" R package. Enrichment analyses on DEGs using gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), and immune cell infiltration. The identification of critical genes from DEGs was then performed using random forests, followed by the developing of new diagnostic models for PCOS using artificial neural networks. We identified 130 up-regulated genes and 132 down-regulated genes in PCOS compared to normal samples. Gene Ontology analysis revealed significant enrichment in myofibrils and highlighted crucial biological functions related to myofilament sliding, myofibril, and actin-binding. Compared with normal tissues, the types of immune cells expressed in PCOS samples are different. A random forest algorithm identified 10 significant genes proposed as potential PCOS-specific biomarkers. Using these genes, an artificial neural network diagnostic model accurately distinguished PCOS from normal samples. The diagnostic model underwent validation using the independent validation set, and the resulting area under the receiver operating characteristic curve (AUC) values was consistent with the anticipated outcomes. Utilizing unique gene combinations, this research created a diagnostic model by merging random forest techniques with artificial neural networks. The AUC indicated a notably superior performance of the diagnostic model.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/ismar-adjunct57072.2022.00065
Early Diagnosis of Poly Cystic Ovary Syndrome (PCOS) in young women: A Machine Learning Approach
  • Oct 1, 2022
  • Suriya Praba T + 2 more

Poly Cystic Ovary Syndrome (PCOS) is the most common hormonal disorder which affects large percentage of women's in their reproductive age and also leads to serious health issues. The main symptoms include infertility, uneven menstrual cycle, mood swing, increased level of male hormones, stomach bloating, thinning of hair etc., Statistics says one in five Indian women were diagnosed with PCOS. If it is not monitored in time may cause serious health impacts. The actual cause for the PCOS is uncertain. By considering the complexities of diagnosing PCOS and time and cost incurred for diagnosis, this research article proposes an automated model which can assist for the physicians. Now a days Machine learning models are playing vital role for medical diagnosis and assisting physicians. In this paper we propose an automated model to classify PCOS and Non-PCOS women with the help of machine learning algorithms. The follicular fluid samples of 100 women was taken. And with the help of Raman spectra and efficient feature selection methods the taken data set is preprocessed. Furthermore, the performance of advanced ML classifiers like Random Forest, Ada Boost, Multilayer Perceptron and decision tree are analyzed. The implementation results reveal that Raman spectroscopy with advanced ML algorithms model can predict the PCOS with 100% accuracy with follicular fluid samples.

  • Research Article
  • 10.52783/jisem.v10i24s.3910
Enhancing COVID-19 Prediction Using Machine Learning: A Comparative Analysis of Feature Selection and Classification Techniques
  • Mar 24, 2025
  • Journal of Information Systems Engineering and Management
  • L William Mary

Introduction: The early and accurate detection of COVID-19 remains a life-threatening challenge in medical analysis. Machine learning is used for predicting disease outcomes based on clinical parameters. This analysis proposes a comparative analysis of feature selection method and classification techniques to enhance COVID-19 detection accuracy using blood biomarkers. We used a pensourse dataset of 1,724 cases, including 35 features. To improve the model performance data preprocessing process included outlier handling, normalization, and transformation techniques to improve model performance. To identify the relevant features, we employed the three-feature selection methods Chi-Square test, Pearson correlation coefficient, and Random Forest. The model prediction accuracy was enhanced using a stacking ensemble classication techniques. The machine learning based classification models effectively predicted COVID-19 infectious disease using blood biomarkers with optimized feature selection techniques. Objectives: To enhance the accuracy of COVID-19 prediction using machine learning techniques by applying feature selection and classification techniques on blood biomarkers. Methods: The comparative analysis utilized a publicly available dataset containing 1,724 cases with 35 attributes. Data preprocessing involved outlier handling, normalization, and transformation techniques. Employed Chi-Square test, Pearson correlation coefficient, and Random Forest feature selection techniques. Stacking ensemble classification algorithm was utilized for the better performance of a model. Results: The classification models demonstrated efficiency in predicting COVID-19 using blood biomarkers. Optimized feature selection significantly improved predictive accuracy, highlighting the importance of selecting relevant features for model performance enhancement. Conclusions: This study showcases the potential of ML-driven approaches for COVID-19 detection, emphasizing the role of feature selection in improving classification accuracy. The findings contribute to the advancement of diagnostic tools, offering a data-driven solution for rapid and reliable COVID-19 screening.

  • Research Article
  • Cite Count Icon 5
  • 10.1093/humrep/deae124
Prospective risk of Type 2 diabetes in 99892 Nordic women with polycystic ovary syndrome and 446055 controls: national cohort study from Denmark, Finland, and Sweden.
  • Jun 11, 2024
  • Human reproduction (Oxford, England)
  • Dorte Glintborg + 10 more

What is the prospective risk of Type 2 diabetes (T2D) in Nordic women with polycystic ovary syndrome (PCOS) compared to controls? A diagnosis of PCOS and BMI ≥30 kg/m2 is a high-risk phenotype for a prospective risk of T2D diagnosis across Nordic countries. The risk of T2D in women with PCOS is increased. The risk of T2D is related to BMI and the magnitude of risk in normal weight women with PCOS has been discussed. However, prospective data regarding risk of T2D in population-based cohorts of women with PCOS are limited. This national register-based study included women with PCOS and age-matched controls. The main study outcome was T2D diagnosis occurring after PCOS diagnosis. T2D was defined according to ICD-10 diagnosis codes and/or filled medicine prescriptions of anti-diabetic medication excluding metformin. The study cohort included women originating from Denmark (PCOS Denmark, N = 27016; controls, N = 133994), Finland (PCOS Finland, N = 20467; controls, N = 58051), and Sweden (PCOS Sweden, N = 52409; controls, N = 254010). The median age at cohort entry was 28 years in PCOS Denmark, Finland, and Sweden with a median follow-up time (interquartile range) in women with PCOS of 8.5 (4.0-14.8), 9.8 (5.1-15.1), and 6.0 (2.0-10.0) years, respectively. Cox regression analyses were adjusted for BMI and length of education. The crude hazard ratio (HR, 95% CI) for T2D diagnosis in women with PCOS was 4.28 (3.98-4.60) in Denmark, 3.40 (3.11-3.74) in Finland, and 5.68 (5.20-6.21) in Sweden. In adjusted regression analyses, BMI ≥30 vs <25 kg/m2 was associated with a 7.6- to 11.3-fold risk of T2D. In a combined meta-analysis (PCOS, N = 99892; controls, N = 446055), the crude HR for T2D in PCOS was 4.64 (3.40-5.87) and, after adjustment for BMI and education level, the HR was 2.92 (2.32-3.51). Inclusion of more severe cases of PCOS in the present study design could have lead to an overestimation of risk estimates in our exposed population. However, some women in the control group would have undiagnosed PCOS, which would lead to an underestimation of T2D risk in women with PCOS. BMI data were not available for all participants. The present study should be repeated in study cohorts with higher background risks of T2D, particularly in populations of other ethnicities. The prospective risk for diagnosis of T2D is increased in women with PCOS, and the risk is aggravated in women with BMI ≥30 kg/m2. Funding in Denmark was from the Region of Southern Denmark, Overlægerådet, Odense University Hospital. Funding in Finland was from Novo Nordisk Foundation, Finnish Research Council and Sigrid Juselius Foundation, the National Regional Fund, Sakari Alhopuro Foundation and Finnish Diabetes Research Foundation. E.E. has received a research grant from Ferring Pharmaceuticals (payment to institution) and serves as medical advisor for Tilly AB, not related to this manuscript. The remaining authors declare no conflict of interest. N/A.

  • Research Article
  • 10.52436/1.jutif.2025.6.4.5166
Optimizing Type 2 Diabetes Classification with Feature Selection and Class Balancing in Machine Learning
  • Aug 24, 2025
  • Jurnal Teknik Informatika (Jutif)
  • Agus Wantoro + 4 more

Type 2 Diabetes (T2DM) is a crucial factor in patient survival and treatment effectiveness. Errors in diabetes detection lead to disease severity, high costs, prolonged healing time, and a decline in service quality. Additionally, a major challenge in developing Machine Learning (ML)-based detection decision support systems is the class imbalance in medical data as well as the high feature dimensionality that can affect the accuracy and efficiency of the model. This research proposes an approach based on feature selection (FS) and handling class imbalance to improve performance in type 2 diabetes. Several feature selection techniques such as Information Gain (IG), Gain Ratio (GR), Gini Decrease (GD), Chi-Square (CS), Relief-F, and FCBF can perform feature selection based on weighting ranking. Furthermore, to address the imbalanced class distribution, we utilize the Synthetic Minority Over-Sampling Technique (SMOTE). ML classification models such as Support Vector Machine (SVM), Gradient Boosting (GB), Tree, Neural Network (NN), Random Forest (RF), and AdaBoost were tested and evaluated based on the confusion matrix including accuracy, precision, recall, and time. The experimental results show that the combination of strategies for handling imbalanced classes significantly improves the predictive performance of ML algorithms. In addition, we found that the combination of feature selection techniques IG+AdaBoost consistently demonstrates optimal performance. This study emphasizes the importance of data preprocessing and the selection of the right algorithms in the development of machine learning-based T2DM detection systems. Accurate detection can reduce the severity of disease, lower treatment costs, speed up the healing process, and improve healthcare services.

  • Research Article
  • 10.7860/jcdr/2024/75199.20403
Development of a Preliminary Screening Tool for Predicting Polycystic Ovarian Syndrome using Machine Learning and Deep Learning Models with Non Invasive Qualitative Features: A Case-control Study
  • Dec 1, 2024
  • JOURNAL OF CLINICAL AND DIAGNOSTIC RESEARCH
  • Hanumanth Narni + 2 more

Introduction: Polycystic Ovarian Syndrome (PCOS) is a prevalent endocrine disorder affecting women of reproductive age, characterised by irregular menstrual cycles, hyperandrogenism and polycystic ovaries. Despite its high prevalence, the diagnosis of PCOS remains challenging due to the variability in symptom presentation. Traditional diagnostic methods involve clinical evaluation, biochemical assays and ultrasound imaging. Machine Learning (ML) and Deep Learning (DL) models offer promising avenues for predicting probable cases of PCOS using non invasive qualitative features. Aim: To develop and compare the performance of Random Forest (RF) and Feedforward Neural Network (FFNN) models in predicting PCOS using abundant non invasive qualitative features. Materials and Methods: A retrospective case-control study was conducted with 100 cases and 100 controls, selected based on ultrasound-confirmed PCOS diagnosis in the Obstetrics and Gynaecology, Gayatri Vidya Parishad Institute of Healthcare and Medical Technology (GVP IHC MT), Medical College departments from February 2024 to October 2024. Data were collected using a structured questionnaire capturing demographic and clinical variables. Feature selection was performed using the Chi-square filter method, with 10 features identified as significant. The data were split into training (80%) and testing (20%) sets and stratified 5-fold cross-validation was applied. Model performance was evaluated using accuracy, precision, recall, F1 score and Area Under Curve (AUC). Results: The RF model demonstrated high performance on the training set, with an average accuracy of 0.95, but exhibited variability on the testing set (accuracy of 0.80). The FFNN model showed consistent performance across both training (accuracy of 0.80) and testing datasets (accuracy of 0.82). The RF model identified irregular cycles and hirsutism as key predictors, while the FFNN model highlighted weight gain and abnormal Body Mass Index (BMI) as important features. The RF model required significantly less computational time compared to the FFNN model. Conclusion: The RF model is preferable for tasks requiring computational efficiency, while the FFNN model offers better generalisation. The complementary feature importance rankings suggest that integrating insights from both models could enhance the understanding of PCOS predictors. In epidemiological investigations, these models can be used as preliminary screening tools for identifying probable cases of PCOS using non invasive qualitative features, especially in areas where diagnostic facilities are not available.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.31083/j.ceog4903057
Health-related quality of life and binge eating among adolescent girls with PCOS
  • Mar 3, 2022
  • Clinical and Experimental Obstetrics &amp; Gynecology
  • Lasma Lidaka + 5 more

Background: Polycystic ovary syndrome (PCOS) affects 3–8% of adolescents. It is characterized by hyperandrogenism and oligoovulation/anovulation. PCOS has a negative impact on health-related quality of life (HRQoL). However, the extents to which factors influence total HRQoL of adolescents are not known. Adult PCOS patients have a higher incidence of binge eating than the general reproductive-age female population. Limited data on binge eating in adolescents with PCOS are available. Aim of this study was to investigate how PCOS and its associated factors, including binge eating, affect the HRQoL of adolescent girls. Methods: This case-control study recruited 63 adolescent girls 13–18 years of age with PCOS and 66 age-matched healthy controls. The PCOS health-related quality of life questionnaire (PCOSQ) and Binge Eating Scale (BES) were used. Multiple linear regression was executed to establish exact predictors and their effect on total HRQoL. Results: HRQoL was significantly lower in adolescents with PCOS than controls (4.9 (interquartile range (IQR) 1.5) vs. 5.8 (IQR 0.9) points). The lowest scores were found in the body hair and weight domains. BES results were not significantly higher in the PCOS group than in the control group (p = 0.727). The main predictors for total HRQoL were PCOS diagnosis per se (β = –1.002; p &lt; 0.001), BES score (β = –0.27; p = 0.004) and body mass index (BMI) percentile (β = –0.007; p = 0.013). Conclusions: The lower HRQoL in adolescents with PCOS is attributable to the diagnosis of PCOS, BES score and BMI percentile, confirming the importance of tailoring clinical interventions and counselling to address the domains (i.e., symptoms of hirsutism and weight concerns) causing distress and lowering HRQoL. Further implementation research is required to evaluate the impact of targeted interventions on the HRQoL of adolescent girls with PCOS.

  • Research Article
  • 10.1007/s43032-025-01953-0
The Role of Endoplasmic Reticulum Stress in Polycystic Ovary Syndrome and Exploration of Potential Therapeutic Targets.
  • Sep 1, 2025
  • Reproductive sciences (Thousand Oaks, Calif.)
  • Yuanyuan Zhang + 3 more

Polycystic ovary syndrome (PCOS) is a common endocrine disorder in women. In recent years, endoplasmic reticulum (ER) stress has gained increasing attention in the pathogenesis of PCOS. This study aims to explore the potential role of ER stress in PCOS by constructing a predictive model based on ER stress-related genes, and further evaluate the characteristics of immune infiltration and screen potential drugs. Five algorithms, including Lasso, Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Algorithm (XGB), and Generalized Linear Model (GLM), were used to screen key genes associated with PCOS and endoplasmic reticulum (ER) stress. A predictive model was constructed to analyze its diagnostic value in PCOS. External validation of the model was conducted using different datasets to assess its predictive accuracy. Furthermore, immune infiltration analysis was performed to explore the relationship between ER stress-related genes and the immune microenvironment in PCOS, revealing their potential role in disease development through immune response regulation. Finally, molecular docking and drug screening platforms were utilized to identify potential drugs that can modulate the ER stress pathway, providing new drug targets for the clinical treatment of PCOS. Two downregulated genes, NQO1 and NPY, and three upregulated genes, TFEB, JUP, and ATF4, were identified in PCOS cases. The constructed nomogram model demonstrated that the area under the ROC curve for NQO1, TFEB, JUP, NPY, and ATF4 in the validation set were 0.629, 0.600, 0.629, 0.543, and 0.743, respectively, indicating that the PCOS diagnostic model built from these five hub genes has good reliability. Immune infiltration analysis revealed that the expression of the JUP gene was positively correlated with T lymphocyte infiltration, while the expression of TFEB and NPY was negatively correlated with T lymphocyte infiltration, suggesting their potential involvement in immune regulation in PCOS. Through molecular docking and drug screening, 66 potential drugs were identified, 18 of which are already approved for use, providing options for pharmacological treatment of PCOS. The results of this study suggest that endoplasmic reticulum (ER) stress-related genes play an important role in the pathogenesis and development of PCOS, and that accurate predictive models may provide new insights for early diagnosis of the disease. Immune infiltration analysis revealed the potential mechanisms of immune cell involvement in PCOS, while drug screening provides a theoretical basis for future targeted therapies for PCOS.

  • Research Article
  • Cite Count Icon 2
  • 10.1007/s43032-024-01538-3
The Inflammatory State of Follicular Fluid Combined with Negative Emotion Indicators can Predict Pregnancy Outcomes in Patients with PCOS.
  • Apr 23, 2024
  • Reproductive sciences (Thousand Oaks, Calif.)
  • Xin Huang + 4 more

Polycystic ovary syndrome (PCOS) is a complex endocrine disorder syndrome with an incidence of 6% to 10% in women of reproductive age. Women with PCOS not only exhibit abnormal follicular development and fertility disorders, but also have a greater tendency to develop anxiety and depression. Our aim was to evaluate the ability of inflammatory factors in follicular fluid to predict embryonic developmental potential and pregnancy outcome and to construct a machine learning model that can predict IVF pregnancy outcomes based on indicators such as basic sex hormones, embryonic morphology, the follicular microenvironment, and negative emotion. In this study, inflammatory factors (CRP, IL-6, and TNF-α) in follicular fluid samples obtained from 225 PCOS and 225 non-PCOS women were detected via ELISA. For patients with PCOS, the levels of CRP and IL-6 in the follicular fluid in the pregnant group were significantly lower than those in the nonpregnant group. For non-patients with PCOS, only the level of IL-6 in the follicular fluid was significantly lower in the pregnant group than in the nonpregnant group. In addition, for both PCOS and non-patients with PCOS, compared with those in the pregnant group, patients in the nonpregnant group showed more pronounced signs of anxiety and depression. Finally, the factors that were significantly different between the two subgroups (pregnancy and nonpregnancy) of patients with or without PCOS were identified by an independent sample t test first and further analysed by multilayer perceptron (MLP) and random forest (RF) models to distinguish the two clinical pregnancy outcomes according to the classification function. The accuracy of the RF model in predicting pregnancy outcomes in patients with or without PCOS was 95.6% and 91.1%, respectively. The RF model is more suitable than the MLP model for predicting pregnancy outcomes in IVF patients. This study not only identified inflammatory factors that can affect embryonic development and assessed the anxiety and depression tendencies of PCOS patients, but also constructed an AI model that predict pregnancy outcomes through machine learning methods, which is a beneficial clinical tool.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon