Gradient Boosting Machine Model Research Articles

Synthetic data generation (SDG) based on generative adversarial networks (GANs) is used in health care, but research on preserving data with logical relationships with synthetic tabular data (STD) remains challenging. Filtering methods for SDG can lead to the loss of important information. This study proposed a divide-and-conquer (DC) method to generate STD based on the GAN algorithm, while preserving data with logical relationships. The proposed method was evaluated on data from the Korea Association for Lung Cancer Registry (KALC-R) and 2 benchmark data sets (breast cancer and diabetes). The DC-based SDG strategy comprises 3 steps: (1) We used 2 different partitioning methods (the class-specific criterion distinguished between survival and death groups, while the Cramer V criterion identified the highest correlation between columns in the original data); (2) the entire data set was divided into a number of subsets, which were then used as input for the conditional tabular generative adversarial network and the copula generative adversarial network to generate synthetic data; and (3) the generated synthetic data were consolidated into a single entity. For validation, we compared DC-based SDG and conditional sampling (CS)-based SDG through the performances of machine learning models. In addition, we generated imbalanced and balanced synthetic data for each of the 3 data sets and compared their performance using 4 classifiers: decision tree (DT), random forest (RF), Extreme Gradient Boosting (XGBoost), and light gradient-boosting machine (LGBM) models. The synthetic data of the 3 diseases (non-small cell lung cancer [NSCLC], breast cancer, and diabetes) generated by our proposed model outperformed the 4 classifiers (DT, RF, XGBoost, and LGBM). The CS- versus DC-based model performances were compared using the mean area under the curve (SD) values: 74.87 (SD 0.77) versus 63.87 (SD 2.02) for NSCLC, 73.31 (SD 1.11) versus 67.96 (SD 2.15) for breast cancer, and 61.57 (SD 0.09) versus 60.08 (SD 0.17) for diabetes (DT); 85.61 (SD 0.29) versus 79.01 (SD 1.20) for NSCLC, 78.05 (SD 1.59) versus 73.48 (SD 4.73) for breast cancer, and 59.98 (SD 0.24) versus 58.55 (SD 0.17) for diabetes (RF); 85.20 (SD 0.82) versus 76.42 (SD 0.93) for NSCLC, 77.86 (SD 2.27) versus 68.32 (SD 2.37) for breast cancer, and 60.18 (SD 0.20) versus 58.98 (SD 0.29) for diabetes (XGBoost); and 85.14 (SD 0.77) versus 77.62 (SD 1.85) for NSCLC, 78.16 (SD 1.52) versus 70.02 (SD 2.17) for breast cancer, and 61.75 (SD 0.13) versus 61.12 (SD 0.23) for diabetes (LGBM). In addition, we found that balanced synthetic data performed better. This study is the first attempt to generate and validate STD based on a DC approach and shows improved performance using STD. The necessity for balanced SDG was also demonstrated.

Read full abstract

Abstract Background The number of non-cardiac surgeries performed worldwide has been steadily increasing, presenting a challenge for clinicians to accurately identify patients at high risk of complications and to allocate the appropriate level of perioperative care. Accurate prediction of postoperative mortality is crucial not only for successful patient care, but also for information-based shared decision-making with patients and efficient allocation of medical resources. Purpose In this study, we aimed to develop a novel predictive model using machine learning methods applied to electronic health record data. Our objective is to identify the risk factors most likely to lead to 30-day major adverse cardiac and cerebrovascular events after non-cardiac surgery Methods We conducted a retrospective analysis of data from a single tertiary care institution that included patients aged 65 years or over who underwent non-cardiac surgery from May 2003 and December 2020. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) data was used to build predictive models, which allowed for the utilization of demographic data, as well as preoperative characteristics such as diagnosis, lab results, vital signs, medications, and information on operations and procedures from the electronic health records (EHRs) in a standardized way. We employed machine learning models, which were developed and validated using the OHDSI Patient-Level-Prediction framework. Results We included a total of 47,915 patients to train (75%) and test (25%) our predictive models. To compare prediction performances, we applied gradient boosting machine (GBM), logistic regression (LR), random forest (RF), AdaBoost (AB), and decision tree (DT). Our results for a test data (Fig 1.) showed that the GBM model had the best performance in terms of the area under the receiver operating characteristic curve (AUROC) (0.903) and the area under the precision-recall curve (AUPRC) (0.395). Conclusions Our study demonstrates that applying machine learning algorithms to electronic health record data can effectively identify patients at high risk of major adverse cardiac and cerebrovascular events following non-cardiac surgery. This algorithm has the potential to support clinicians in effectively identifying patients at high risk and provide appropriate perioperative care. Further work is needed to validate and refine the proposed model to ensure its external validity and broader applicability in clinical practice.We plan to validate the proposed model externally by testing it on a cohort of approximately 280,000 patients from other tertiary care institution, and present the results at the 2023 ESC Congress.

Read full abstract

Gradient Boosting Machine Model Research Articles

Articles published on Gradient Boosting Machine Model

Machine-learning algorithms in screening for type 2 diabetes mellitus: Data from Fasa Adults Cohort Study.

Predictive model of acute kidney injury in critically ill patients with acute pancreatitis: a machine learning approach using the MIMIC-IV database

A machine learning approach to differentiate wide QRS tachycardia: distinguishing ventricular tachycardia from supraventricular tachycardia.

AI-Based Virtual Sensing of Gaseous Pollutant Emissions at the Tailpipe of a High-Performance Vehicle

Factors influencing recurrence and model development for recurrence of minimally invasive percutaneous transhepatic lithotripsy: a single-center retrospective study.

A feasibility study on utilizing machine learning technology to reduce the costs of gastric cancer screening in Taizhou, China.

Development and validation of a machine learning model for clinical wellness visit classification in cats and dogs.

A new prediction model for acute kidney injury following liver transplantation using grafts from donors after cardiac death.

Comparative analysis of BERT and FastText representations on crowdfunding campaign success prediction.

Machinelearning-Basedmodelforprediction of Narcolepsy Type 1 in Patients with Obstructive Sleep Apnea with Excessive Daytime Sleepiness.

Prediction Model of Ocular Metastases in Gastric Adenocarcinoma: Machine Learning-Based Development and Interpretation Study.

Discriminating insulin resistance in middle-aged nondiabetic women using machine learning approaches.

Low CDKN1B Expression Associated with Reduced CD8+ T Lymphocytes Predicts Poor Outcome in Breast Cancer in a Machine Learning Analysis.

Leveraging multimodal MRI-based radiomics analysis with diverse machine learning models to evaluate lymphovascular invasion in clinically node-negative breast cancer

Machine learning-based water quality prediction using octennial in-situ Daphnia magna biological early warning system data

Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy.

Machine learning-based prediction of 30-day major adverse cardiac and cerebrovascular events in non-cardiac surgery patients

A generalizable electrocardiogram-based artificial intelligence model for 10-year heart failure risk prediction

Dynamic geospatial modeling of mycotoxin contamination of corn in Illinois: unveiling critical factors and predictive insights with machine learning.

Machine Learning for COVID-19 and Influenza Classification during Coexisting Outbreaks

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Gradient Boosting Machine Model Research Articles

Articles published on Gradient Boosting Machine Model

Machine-learning algorithms in screening for type 2 diabetes mellitus: Data from Fasa Adults Cohort Study.

Predictive model of acute kidney injury in critically ill patients with acute pancreatitis: a machine learning approach using the MIMIC-IV database

A machine learning approach to differentiate wide QRS tachycardia: distinguishing ventricular tachycardia from supraventricular tachycardia.

AI-Based Virtual Sensing of Gaseous Pollutant Emissions at the Tailpipe of a High-Performance Vehicle

Factors influencing recurrence and model development for recurrence of minimally invasive percutaneous transhepatic lithotripsy: a single-center retrospective study.

A feasibility study on utilizing machine learning technology to reduce the costs of gastric cancer screening in Taizhou, China.

Development and validation of a machine learning model for clinical wellness visit classification in cats and dogs.

A new prediction model for acute kidney injury following liver transplantation using grafts from donors after cardiac death.

Comparative analysis of BERT and FastText representations on crowdfunding campaign success prediction.

Machinelearning-Basedmodelforprediction of Narcolepsy Type 1 in Patients with Obstructive Sleep Apnea with Excessive Daytime Sleepiness.

Prediction Model of Ocular Metastases in Gastric Adenocarcinoma: Machine Learning-Based Development and Interpretation Study.

Discriminating insulin resistance in middle-aged nondiabetic women using machine learning approaches.

Low CDKN1B Expression Associated with Reduced CD8+ T Lymphocytes Predicts Poor Outcome in Breast Cancer in a Machine Learning Analysis.

Leveraging multimodal MRI-based radiomics analysis with diverse machine learning models to evaluate lymphovascular invasion in clinically node-negative breast cancer

Machine learning-based water quality prediction using octennial in-situ Daphnia magna biological early warning system data

Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy.

Machine learning-based prediction of 30-day major adverse cardiac and cerebrovascular events in non-cardiac surgery patients

A generalizable electrocardiogram-based artificial intelligence model for 10-year heart failure risk prediction

Dynamic geospatial modeling of mycotoxin contamination of corn in Illinois: unveiling critical factors and predictive insights with machine learning.

Machine Learning for COVID-19 and Influenza Classification during Coexisting Outbreaks