Polycystic ovary syndrome (PCOS) can affect a female’s reproductive system and comes with associated complications to the endocrine system of the affected individual. The diagnosis success of the condition varies depending on the stage of the disease. Thus, there is a need for investment in additional technologies that can help bolster the overall diagnosis success of the condition and to cue in prompt care strategies. A substantial amount of work has been done on this, where artificial intelligence technology has been investigated around the exploitation of patient medical health records towards predicting whether a patient is carrying the PCOS condition. The shortcomings associated with this related literature are based on the use of an unbalanced dataset towards the training of the candidate models, which can induce a form of model bias and approach the problem as a binary-based prediction exercise. This study aims at providing a solution to this apparent gap in knowledge by designing a prediction model using the Kaggle PCOS dataset, which is initially balanced using a synthetic sample generation algorithm. Next, a probability-based inference system is designed to estimate and stage the degree of the PCOS condition in the patient.
Read full abstract