Global Gender Inequality Through Explainable AI: Machine Learning, Clustering, and SHAP Insights

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Objective: This paper analyzes gender equality across countries in the year 2024 by using the GGGI, with the intention of disentangling the unseen structural and non-deterministic patterns. Instead of repeating the process of calculating the index, it is openly recognizing the compositional feature of the GGGI and the unseen similarities between the indices. Methods: This research employs a global cross-sectional study of 146 countries over the four primary GGGI sectors: economic participation, education, health and survival, and empowerment. Where OLS is only employed as a diagnostic test, as its almost perfect fit (R 2 ∼1) is squarely mechanical and lacks relevance for inference. Apart from ensemble models employed for predictions, K-means clustering, SHAP analysis, and GridSearchCV optimization are also used. Findings: The out-of-sample predictions demonstrate high levels of predictive accuracy, with Gradient Boosting models yielding an R 2 of approximately 0.90 and an RMSE of approximately 0.045, indicating that there is significant nonlinear information beyond index aggregation. Unsupervised clustering techniques show that there are seven distinct country clusters that go beyond traditional geographic and income divisions, which can be identified with more than 93% accuracy. The SHAP results show that empowerment and economic participation are drivers, while there is insignificant variation in healthcare. Contribution: This study identifies the boundaries of regression analysis in index research, as well as the advantages of machine learning analysis in determining structural patterns related to gender equity.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.17747/2078-8886-2018-1-64-71
Industry characteristic of bankruptcy prediction models appliance
  • May 25, 2018
  • Strategic decisions and risk management
  • E A Fedorova + 2 more

The aim of the research is to develop the methodology of bankruptcy prediction applying the specified statutory values of the existing models with a glance to company’s industry and developing the author’s prediction model. Initially authors estimated the forecast accuracy of the existing models for the enterprises of 8 industries. Using CART (Classification And Regression Tree) methodology the original statutory values of the models were specified for every industry under research. The calculated statutory values demonstrated the high level of prediction accuracy and balanced the indicators of accuracy for bankrupt and non-bankrupt companies. The indicators with the maximum level of significance for bankruptcy prediction were selected from all the models. They formed a basis for a new developed model, which has demonstrated the high level of prediction accuracy on a sample under research. The statutory values for the new model were also developed.The implementation of the research’s results will increase the efficiency of bankruptcy prediction and low the number of bankrupt companies.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/eucap.2006.4584877
Novel ray-tracing acceleration technique employing genetic algorithm for radio propagation prediction
  • Nov 1, 2006
  • Tetsuro Imai

The ray-tracing method is very attractive because several radio propagation characteristics can be simply predicted based on the knowledge of the rays geometrically traced from the transmitter to the receiver. However, when taking many structures into consideration, many rays must be traced in order to achieve a high level of prediction accuracy, and this is very time consuming. Accelerating the ray-tracing process while maintaining a high level of prediction accuracy is an important problem that needs to be addressed. This paper proposes a ray-tracing acceleration technique employing the Genetic algorithm called the GA_RT method. The performance is evaluated based on computer simulation and the results show that the number of calculations can be reduced to approximately 20% if 4 dB is within the allowable calculation error. Moreover, when there are many calculation points in a wide area, application of the Chain model is proposed in conjunction with the GA_RT method. By using this combination, the computation time is minimized to approximately 6% with the calculation error of approximately 1 dB.

  • Research Article
  • Cite Count Icon 2
  • 10.1504/ijipm.2019.10022000
Text mining as a facilitating tool for deploying blockchain technology in the intellectual property rights system
  • Jan 1, 2019
  • International Journal of Intellectual Property Management
  • Tatyana Maximova + 2 more

The aim of the study is to introduce a new application of machine-learning techniques (text mining, clustering and classification) and the blockchain technology within the intellectual property rights (IPRs) management system. Using such machine-learning techniques facilitates the management process of intellectual properties (IPs) and makes it more efficient. Additionally, using the blockchain technology for IPRs management purposes enables all stakeholders to utilise the extracted data of the IP objects from the blockchain network. In this study, a text-mining technique was used to identify the two types of IP documents based on specific categories, namely, patent and trademark. In order to achieve this objective, a range of machine-learning techniques was used for 5,500 patent documents and 400 trademark documents. The results of the logistic regression model showed a high level of prediction accuracy of document type at the pre-registration stage on the blockchain network. This high level of prediction accuracy demonstrates that using machine-learning and text-mining techniques will facilitate the IPRs management system. This new application of specific machine-learning techniques in the IPRs management process contributes essentially to solving the problem in a conventional IPRs system associated with rights protection and data availability.

  • Research Article
  • Cite Count Icon 18
  • 10.1207/s15327744joce1201_04
Using Inductive Learning to Predict Bankruptcy
  • Mar 1, 2002
  • Journal of Organizational Computing and Electronic Commerce
  • James A Gentry + 3 more

An emerging trend in organizational computing is using information technology to learn decision knowledge from enterprise data. The primary contribution of this study is the presentation of a sound theory and a comprehensive technique for learning the decision model for predicting bankruptcy. The theory is based on the information contained in cash flow components, which is the foundation of valuation theory, and an analytical system that measures the amount of uncertainty in the cash flow information. The approach links a tree-based inductive learning system that relies on the concept of entropy, with an information system based on the cash flow of a firm. A test of the cash flow approach involves the cash flow components for a sample of 99 failed and 99 non-failed companies. The structural instability of cash flow components generated by an inductive learning system is a serious issue for financial analysts. However, this shortcoming is overcome by using a jackknife procedure to develop a global tree that identifies the most important cash flow components. The final global tree found only 3 cash flow components were needed to classify correctly 89% of the companies as either failed or non-failed. Only a few early studies achieved a higher level of predictive accuracy. The 3 significant cash flow components were dividends, net investment, and net operating cash flow. Using the same data, a probit statistical technique generated a 67.5% predictive accuracy. In summary, the inductive learning results indicate that cash flow components are not only a natural tool for explaining the bankruptcy process, but they provide a high level of predictive accuracy.

  • Research Article
  • Cite Count Icon 8
  • 10.3109/14767058.2014.947573
Can we improve the targeting of respiratory syncytial virus (RSV) prophylaxis in infants born 32–35 weeks’ gestational age with more informed use of risk factors?
  • Aug 14, 2014
  • The Journal of Maternal-Fetal & Neonatal Medicine
  • Xavier Carbonell-Estrany + 5 more

Objective: To evaluate the key risk factors for respiratory syncytial virus (RSV) hospitalisation in 32–35 weeks’ gestational age (wGA) infants.Methods: Published risk factors were assessed for predictive accuracy (area under the receiver operating characteristic curve [ROC AUC]) and for number needed to treat (NNT).Results: Key risk factors included: proximity of birth to the RSV season; having siblings; crowding at home; day care; smoking; breast feeding; small for GA; male gender; and familial wheezing/eczema. Proximity of birth to the RSV season appeared the most predictive. Risk factors models from Europe and Canada were found to have a high level of predictive accuracy (ROC AUC both >0.75; NNT for European model 9.5). A model optimised for three risk factors (birth ±10 weeks from start of RSV season, number of siblings ≥2 years and breast feeding for ≤2 months) had a similar level of prediction (ROC AUC: 0.776; NNT: 10.2). An example two-risk factor model (day care attendance and living with ≥2 siblings <5 years old) had a lower level of predictive accuracy (ROC AUC: 0.55; NNT: 26).Conclusions: An optimised combination of risk factors has the potential to improve the identification of 32–35 wGA infants at heightened risk of RSV hospitalisation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.4102/sajbm.v27i3.809
Market timing and unit trusts: Can you beat the market?
  • Sep 30, 1996
  • South African Journal of Business Management
  • Colin Firer + 3 more

The article reports the results of an investigation into the level of predictive accuracy required to benefit from a market-timing strategy using unit trusts as the investment medium. Three unit trusts within the same management company were used as the assets between which a market timer could switch his or her investment. Switching would depend on the timer's forecast of which of the three investments would produce the best returns in the forthcoming period. Remaining within the family of trusts managed by a single company kept the transaction costs to a minimum. Investments could be made in a general equity, a resources or an income unit trust. As attractive as the potential returns from market timing within a family of unit trusts might appear to be, the levels of predictive accuracy required to beat a buy-and-hold strategy with certainty were found to be extremely high (of the order of 80%). In addition, much of the benefit from timing depends on being in the highest yielding asset for a small, but specific number of periods. Therefore not only does one require a high level of predictive accuracy, but it is important to be correct in the key periods when most of the return above the buy-and-hold is earned.

  • Research Article
  • 10.1002/alz.064192
Blood plasma biomarkers improve prediction accuracy over and above genetic predictors of Alzheimer’s disease
  • Dec 1, 2022
  • Alzheimer's &amp; Dementia
  • Joshua O Stevenson‐Hoare + 13 more

BackgroundPrediction models of Alzheimer’s disease using genetic information, such as polygenic risk scores, have been able to reach high levels of prediction accuracy. However, since confirmation of the disease is only possible post‐mortem, these levels of prediction accuracy are typically only achievable in pathologically confirmed cohorts. In living individuals who have been clinically assessed for AD, prediction accuracy by genetics is still good but much lower. Biomarkers can indirectly assess AD pathologies, and so may be able to bridge the prediction accuracy gap. Blood plasma biomarkers may have additional clinical utility as they are cheaper and more accessible compared to traditional CSF or PET methods.MethodWe measured five blood plasma biomarkers known to be linked to AD pathologies (Aβ40, Aβ42, GFAP, NfL, P‐tau181) in a cohort of AD cases (N=1439, mean age 68) and elderly screened controls (N=508, mean age 82). We also gathered information on APOE genotype, age at sample collection, sex, and age at onset and disease duration in cases.ResultLinear regression models showed that all biomarker measurements were associated with age at onset in cases and most were associated with disease duration. Biomarkers were also associated with age at sample collection in both cases and controls, demonstrating their effectiveness for tracking neurological change over time. Using logistic regression, we found prediction accuracies for AD status for each biomarker individually of AUC=0.56‐0.66 and by APOE and PRS AUC=0.73. A model combining all biomarkers had an AUC=0.75. The best prediction accuracy was achieved by combining all biomarkers with genetics and age at sample collection, which reached an AUC=0.81, and explained variance of R2=0.29.ConclusionWe found that blood plasma biomarkers predicted AD status and were associated with disease duration. Furthermore, biomarkers explain some variance not captured by genetic factors and therefore improve accuracy when combined in predictive models. Biomarkers also have the advantage of specificity over clinical assessments, which may confuse dementia subtypes due to phenotype similarity. Therefore, blood plasma biomarkers can be a useful tool for the assessment and prediction of AD on their own or in combination with genetic predictors.

  • Research Article
  • 10.1093/eurheartj/ehae666.1807
Cardio-pulmonary exercise testing in adult congenital heart patients: a new data-focused approach to predict mortality
  • Oct 28, 2024
  • European Heart Journal
  • A Barradas Pires + 3 more

Background/Introduction The cardio-pulmonary exercise test (CPET) has been a pivotal tool for functional and prognostic evaluation in adults with congenital heart disease (ACHD). During a routine CPET test, a myriad of variables are collected reflecting the cardiovascular, pulmonary, and skeletal muscle systems. However, clinicians typically use only a few easily and accessible variables to establish prognosis, which can come at an accuracy cost. Machine learning algorithms have undergone significant development in recent years and have found applications in the medical field. One of their main advantages is the ability to handle a large number of variables and extract information from their various combinations, while excluding redundant data, thus offering a higher level of prediction accuracy. Purpose Our objective was to develop a machine learning model to predict death during follow-up in a large population of ACHD patients who have undergone routine CPET in a large-volume single specialist centre. Methods All available CPET studies for adult ACHD patients (age &amp;gt; 15 years) performed from December 1999 to December 2021 at our centre were included, and all standard available variables were extracted. The primary outcome was all-cause mortality since the CPET, collected up to November 2023. Data exploration, data cleaning, and feature engineering steps were conducted using Python (v3.8). Continuous variables were standardised. The supervised machine learning algorithm XGBoost was used for classification (all-cause mortality), with the best hyperparameters selected through cross-validation. For the final model, a feature importance analysis based on permutation techniques was used to explore the variables with more impact on model accuracy. Results A total of 6361 studies were included, with median age of 31 years (IQR 23-43), 56% male. During follow-up, a total of 491 deaths were recorded. Of available demographic, clinical and exercise parameters (n=129 variables), 21 were deemed sufficient in terms of completeness, absence of significant correlation and clinically relevant for the subsequent analysis. These included demographic variables (age and sex) and CPET variables (peak VO2, FEV1, peak heart rate, exercise time, etc.) and were used to train the model. The accuracy of the model in predicting death after CPET in the test set had an accuracy of 93.5% and an area under the ROC curve (AUC) of 85.0% (Figure1). The parameters with the highest relative feature importance were FEV1, body mass index and percent predicted peak VO2 (Figure 2). Conclusions Machine learning models can be effectively employed in specialised cardiac populations, such as ACHD, potentially providing a high level of prediction accuracy by efficiently harnessing the potential of available data. However, its applicability to mortality prediction for ACHD patients requires further external validation.ROC curve AUC for the model predictionsFeature importance using permutation

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1108/ramj-01-2023-0012
Role clarity, perceived cohesion and felt responsibility as antecedents of altruism and conscientiousness among college teachers in Kerala
  • May 10, 2023
  • Rajagiri Management Journal
  • Makesh Gopalakrishnan + 1 more

PurposeLiterature evidences that altruism and conscientiousness are very important discretionary behaviours within the broader framework of Organizational Citizenship Behaviour (OCB) among teaching community. The present study is intended to examine the effect of role clarity, perceived cohesion and felt responsibility on altruism and conscientiousness among college teachers in Kerala.Design/methodology/approachA questionnaire-based survey was conducted among 354 college teachers, and the causal effect was examined using Partial Least Square-based structural equation modelling.FindingsValidity and reliability of the model were established through measurement model evaluation. Explanatory power of the model was established. Cohesion and felt responsibility significantly predicted altruism, but the effect of role clarity on altruism was not significant. Effect of cohesion, felt responsibility and role clarity on conscientiousness was significant.Originality/valueThe study contributed to the existing theory on antecedents of OCB. The model has high levels of predictive accuracy – role clarity, cohesiveness and felt responsibility – capable of explaining the discretionary behaviour among college teachers.

  • Research Article
  • Cite Count Icon 187
  • 10.1016/s1355-0306(02)71820-0
Linking commercial burglaries by modus operandi: tests using regression and ROC analysis.
  • Jul 1, 2002
  • Science &amp; Justice
  • C Bennell + 1 more

Linking commercial burglaries by modus operandi: tests using regression and ROC analysis.

  • Dissertation
  • 10.31390/gradschool_dissertations.3381
Molecular, statistical and genetic analyses of complex agronomic traits in rice
  • Jun 10, 2022
  • Samuel Ordonez Jr

Novel molecular and statistical approaches are needed for identification of DNA markers associated with complex traits in rice. The first research objective was to evaluate mixed-model and multiple regression approaches for their ability to identify molecular markers associated with complex traits in rice. A combined mixed model and multiple regression approach was optimal for selecting the smallest number of DNA markers associated with relatively high R2 values and for consistency with previous mapping studies. Support Vector Regression (SVR) was evaluated in the second research objective for the ability to generate high levels of accuracy and power for markers associated with complex traits. High levels of prediction accuracy and power were observed for the selected markers. SVR produced greater model accuracy and ability to explain trait variation than multiple linear regression. Single nucleotide polymorphic (SNP) markers for aroma, amylose content and gelatinization temperature were evaluated in the third research objective for marker-assisted improvement of breeding lines. This strategy increased frequency of desired alleles by an average of 26 percent in only two generations. Genetic analysis of pollen sterility was conducted in the fourth research objective for an F2 population derived from an outcross between a weedy biotype and a commercial variety. Segregation analyses revealed that seed fertility was governed by two dominant genes, a result similar to the cytoplasmic male sterile (CMS)-WA system used to develop commercial hybrids. Pollen sterility was controlled by two recessive genes. The pollen sterility trait could be exploited as a new source of CMS for hybrid rice breeding. Additional research is needed to confirm if lines developed from this natural outcross represent a new source of CMS. Overall results show that both standard and new data mining approaches can be used to successfully identify candidate genes and DNA markers associated with complex agronomic traits. In addition, the SNP markers were shown to rapidly enrich frequency of desired alleles associated with rice grain and cooking quality traits. All results demonstrated that a combination of molecular, statistical, and genetic approaches created an effective strategy to advance our understanding of factors that govern complex traits in rice.

  • Research Article
  • Cite Count Icon 6
  • 10.1207/s15327906mbr1703_4
Assessing The Discriminant Validity Of Regression Models And Subjectively Weighted Models Of Judgments
  • Jul 1, 1982
  • Multivariate Behavioral Research
  • Kevin R Murphy

When either regression models or subjectively weighted models are used as aids in making placement decisions, the discriminant validity of these models is of interest. When all predictor information is used in all decisions, models which assign equal weights cannot simultaneously show high levels of predictive accuracy and discriminant validity; in some settings, both regression models and subjectively weighted models may. The discriminant validity of regression models and of subjectively weighted models was investigated in two judgment experiments. Both types of models showed high levels of accuracy and cross-validity in both experiments. Regression models showed discriminant validity in both experiments, while subjectively weighted models failed to show discriminant validity in the second. The homogeneity of cue validities appeared to moderate both the level of discriminant validity and the relationship between similarity of subjective models, across tasks, and discriminant validity.

  • Research Article
  • Cite Count Icon 30
  • 10.1016/j.heliyon.2024.e33681
A review on machine learning implementation for predicting and optimizing the mechanical behaviour of laminated fiber-reinforced polymer composites
  • Jun 26, 2024
  • Heliyon
  • Sherif Samy Sorour + 2 more

The utilization of Machine Learning (ML) techniques in the analysis of the mechanical behavior of fiber-reinforced polymers (FRP) has been increasingly applied in composite materials. The ability to achieve high levels of accuracy, coupled with a reduction in computational cost once the ML models are trained, presents a powerful tool for optimization and in-depth analysis of laminated FRP. This review paper aims to provide insight into the emergence of this trend, offer an overview of various ML algorithms and related subtopics, and demonstrate different implementations of ML from recent studies with a specific focus on the design and optimization of FRP composites. The reviewed studies have exhibited high levels of prediction accuracy and have effectively employed ML to optimize the mechanical properties of composite materials. It was also highlighted that selecting the appropriate ML algorithm and neural network structure is crucial for various problems and data. While the studies reviewed have shown promising results, further research is needed to fully realize the potential of ML in this field.

  • Research Article
  • Cite Count Icon 1
  • 10.1108/rjta-07-2019-0033
Development of prediction model through linear multiple regression for the prediction and analysis of the GSM of embroidered fabric
  • Jan 11, 2020
  • Research Journal of Textile and Apparel
  • Anirban Dutta + 1 more

Purpose The purpose of this paper is to establish the regression equation based upon a set of samples prepared through structured design of experiment and form a prediction model for prediction of the areal density gram per square meter (GSM) of the embroidered fabrics and study the influence of basic input parameters. Design/methodology/approach Embroidery samples are prepared taking input parameters as GSM of the base fabric, linear density of the embroidery thread and stitch density of the embroidery design. Three levels of values are identified for each of the input parameters. Taguchi and Box-Behnken experiment design principles are used to prepare two sets of samples. Linear multiple regression is used to determine the prediction equations based upon each of the two sets and the combined set as well. Prediction equations are statistically verified for the prediction accuracy. Also, surface curves are prepared to study the influence of embroidery parameters on the GSM. Findings It is found that all the three prediction models developed in this study can predict with a very satisfactory level of accuracy. However, the regression equation based upon the data set prepared according to Taguchi experiment design is emerged as the prediction model with highest level of prediction accuracy. Corresponding equation coefficients and several three-dimensional surface curves are used to study the influence of embroidery parameters and it is found that the stitch density is the most influential input parameter followed by stitch length and the GSM of base fabric. Research limitations/implications This can be used to assess the GSM of embroidered fabrics before starting the actual embroidery process. So, this model can help the embroidery designers significantly to pre-estimate the GSM of the embroidered fabrics and select the design parameters accordingly. Also, this model can be a useful tool for estimation of thread consumption and thread cost in embroidery. Practical implications The input parameters used here are very basic parameters related to design and materials, which can be easily available. And also, a simple linear multiple regression is used to make the prediction equation simple and easy to use. So, this model can help the embroidery designers or garment designers to select/adjust the embroidery parameters and thread parameters accordingly in the planning and designing stage itself to ensure that the GSM of embroidered fabrics remains within desirable range. Also, this prediction model developed hereby may be a very useful tool for estimation of the consumption and cost of embroidery threads. Originality/value This paper presents a very fundamental study to reveal the effect of embroidery parameters on the GSM, through development of regression equations. It can help future researchers in optimizations of input parameters and forming a technical guideline for the embroidery designers for selection of the design parameters for a desired GSM of embroidered fabric.

  • Research Article
  • Cite Count Icon 1
  • 10.3382/ps/pey273
Predicting ascites incidence in a simulated altitude-challenge using single nucleotide polymorphisms identified in multi-generational genome wide association studies.
  • Nov 1, 2018
  • Poultry science
  • Katy J Tarrant + 4 more

Predicting ascites incidence in a simulated altitude-challenge using single nucleotide polymorphisms identified in multi-generational genome wide association studies.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.