Polycystic ovary syndrome (PCOS) is a metabolic disorder with clinical heterogeneity. PCOS women with non-hyperandrogenemia (NA) might be misdiagnosed due to a lack of diagnostic markers. This study aims to systematically analyze the differences in steroid hormones between PCOS women with hyperandrogenemia (HA) and NA, and to screen classification diagnosis models for PCOS. The serum samples from 54 HA-PCOS, 79 NA-PCOS and 60 control women (Non-PCOS) aged between 18 and 35 were measured by an integrated steroid hormone-targeted quantification assay using LC-MS/MS. The levels of serum androgens, corticosteroids, progestins and estrogens in the steroid hormone biosynthesis pathway were analyzed in PCOS and Non-PCOS women. Eight machine learning methods including Linear Discriminant Analysis (LDA), K-nearest Neighbors (KNN), Boosted Logistic Regression (LogitBoost), Naive Bayes (NB), C5.0 algorithm (C5), Random Forest (RF), Support Vector Machines (SVM), and Neural Network (NNET) were performed, evaluated and selected for classification diagnosis of PCOS. A 10-fold cross-validation on the training set was performed. The whole metabolic flux from cholesterol to downstream steroid hormones increased significantly in PCOS, especially in HA-POCS women. The RF model was chosen for the classification diagnosis of HA-PCOS, NA-PCOS, and Non-PCOS women due to the maximum average accuracy (0.938, p<0.001), AUC (0.989, p<0.001), and kappa (0.906, p<0.001), and the minimum logLoss (0.200, p<0.001). Five steroid hormones including testosterone, androstenedione, total 2-methoxyestradiol, total 4-methoxyestradiol, and free estrone were selected as the decision trees for the simplified RF model. A total of 37 women were included in the validation set. The diagnostic sensitivity for HA-PCOS, NA-PCOS, and Non-PCOS was 100 %, 93.3 % and 91.7 %, respectively. HA-PCOS, NA-PCOS, and Non-PCOS women showed obvious different steroid hormone profiles. The simplified RF model based on two androgens and three estrogens could be effectively applied to the classification diagnosis of PCOS, further reducing the missed diagnosis rate of NA-PCOS.
Read full abstract