Rare Events Data Research Articles

Statistical post-processing has been widely adopted in probabilistic forecasting, especially when a predictive distribution of weather variables is of interest. Among the available post-processing methods, a common tool is the ensemble Bayesian model averaging. Challenges arise when it is applied to heavy rain prediction, including the precipitation produced by tropical cyclones. The issues are related to the rare extreme event and limited available data for model training. To address this, we propose in this research a Bayesian mixture model, Bmix, based on information from forecast members and historical tropical cyclones, rather than solely on immediately observed precipitation data. This approach provides a grid-specific Bayesian predictive distribution and an easy-to-derive probabilistic forecast with posterior samples generated from the Markov chain Monte Carlo algorithm. This post-processing procedure can be executed on each grid simultaneously, reducing the computational time significantly. In addition, a categorized Bmix (C.Bmix) is implemented to accommodate different scenarios such as heavy or torrential rain in applications. The proposed approaches are demonstrated on the 24-hour accumulated precipitation forecast of a category 4-equivalent super typhoon Dujuan landing in Taiwan in 2015; while two others, Matmo and Soudelor, are adopted to provide historical information. Over the Taiwan main island, a total of 1282 grids are considered, each with a resolution of 5 km and 20 ensemble member forecasts from the Weather Research and Forecasting Ensemble Prediction System (WEPS) at a 6-hour interval during the typhoon period. The analyses show that the proposed mixture approaches present larger percentages of positive CRPSS values than traditional BMA. For instance, at the time when Dujuan made landfall, the percentages are 92.2 % for Bmix, 95.2 % for C.Bmix, and 15.4 % for BMA when the true precipitation amount exceeds 200 mm/24 h. The proposed methods outperform in the probability prediction and the overall pattern of precipitation.

Read full abstract

BackgroundMedical decision-making impacts both individual and public health. Clinical scores are commonly used among various decision-making models to determine the degree of disease deterioration at the bedside. AutoScore was proposed as a useful clinical score generator based on machine learning and a generalized linear model. However, its current framework still leaves room for improvement when addressing unbalanced data of rare events. MethodsUsing machine intelligence approaches, we developed AutoScore-Imbalance, which comprises three components: training dataset optimization, sample weight optimization, and adjusted AutoScore. Baseline techniques for performance comparison included the original AutoScore, full logistic regression, stepwise logistic regression, least absolute shrinkage and selection operator (LASSO), full random forest, and random forest with a reduced number of variables. These models were evaluated based on their area under the curve (AUC) in the receiver operating characteristic analysis and balanced accuracy (i.e., mean value of sensitivity and specificity). By utilizing a publicly accessible dataset from Beth Israel Deaconess Medical Center, we assessed the proposed model and baseline approaches to predict inpatient mortality. ResultsAutoScore-Imbalance outperformed baselines in terms of AUC and balanced accuracy. The nine-variable AutoScore-Imbalance sub-model achieved the highest AUC of 0.786 (0.732–0.839), while the eleven-variable original AutoScore obtained an AUC of 0.723 (0.663–0.783), and the logistic regression with 21 variables obtained an AUC of 0.743 (0.685–0.801). The AutoScore-Imbalance sub-model (using a down-sampling algorithm) yielded an AUC of 0.771 (0.718–0.823) with only five variables, demonstrating a good balance between performance and variable sparsity. Furthermore, AutoScore-Imbalance obtained the highest balanced accuracy of 0.757 (0.702–0.805), compared to 0.698 (0.643–0.753) by the original AutoScore and the maximum of 0.720 (0.664–0.769) by other baseline models. ConclusionsWe have developed an interpretable tool to handle clinical data imbalance, presented its structure, and demonstrated its superiority over baselines. The AutoScore-Imbalance tool can be applied to highly unbalanced datasets to gain further insight into rare medical events and facilitate real-world clinical decision-making.

Read full abstract

Rare Events Data Research Articles

Articles published on Rare Events Data

A Comprehensive Survey on Rare Event Prediction

Coding with the machines: machine-assisted coding of rare event data.

Comparing various Bayesian random‐effects models for pooling randomized controlled trials with rare events

Bayesian typhoon precipitation prediction with a mixture of ensemble forecast-based and historical event-based prediction functions

A NEW APPROACH TO DETERMINE THE INFLUENCE OF WEATHER CONDITIONS ON FOREST FIRE RISK IN THE MEDITERRANEAN REGION OF TÜRKİYE

Predictive factors of atopic-like dermatitis induced by IL-17A inhibitors in patients with psoriasis: A 2-year follow-up study.

Improving performance of hurdle models using rare-event weighted logistic regression: an application to maternal mortality data.

Long-term mental health consequences of female- versus male-perpetrated child sexual abuse

Identification of high-risk roadway segments for wrong-way driving crash using rare event modeling and data augmentation techniques

AutoScore-Imbalance: An interpretable machine learning tool for development of clinical scores with rare events data

Whose trade follows the flag? Institutional constraints and economic responses to bilateral relations

A comparison of confidence distribution approaches for rare event meta-analysis.

Comparison of Different Estimation Approaches in Rare Events Data

Jackknife empirical likelihood confidence intervals for assessing heterogeneity in meta-analysis of rare binary event data

Estimating group fixed effects in panel data with a binary dependent variable: How the LPM outperforms logistic regression in rare events data

Psychosocial Determinants of Burn-Related Suicide: Evidence From the National Violent Death Reporting System.

A Comparative Study of the Bias Correction Methods for Differential Item Functioning Analysis in Logistic Regression with Rare Events Data.

Precluding rare outcomes by predicting their absence.

Logistic Regression Procedure Using Penalized Maximum Likelihood Estimation for Differential Item Functioning

Assessing correlates of protection in vaccine trials: statistical solutions in the context of high vaccine efficacy

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Rare Events Data Research Articles

Articles published on Rare Events Data

A Comprehensive Survey on Rare Event Prediction

Coding with the machines: machine-assisted coding of rare event data.

Comparing various Bayesian random‐effects models for pooling randomized controlled trials with rare events

Bayesian typhoon precipitation prediction with a mixture of ensemble forecast-based and historical event-based prediction functions

A NEW APPROACH TO DETERMINE THE INFLUENCE OF WEATHER CONDITIONS ON FOREST FIRE RISK IN THE MEDITERRANEAN REGION OF TÜRKİYE

Predictive factors of atopic-like dermatitis induced by IL-17A inhibitors in patients with psoriasis: A 2-year follow-up study.

Improving performance of hurdle models using rare-event weighted logistic regression: an application to maternal mortality data.

Long-term mental health consequences of female- versus male-perpetrated child sexual abuse

Identification of high-risk roadway segments for wrong-way driving crash using rare event modeling and data augmentation techniques

AutoScore-Imbalance: An interpretable machine learning tool for development of clinical scores with rare events data

Whose trade follows the flag? Institutional constraints and economic responses to bilateral relations

A comparison of confidence distribution approaches for rare event meta-analysis.

Comparison of Different Estimation Approaches in Rare Events Data

Jackknife empirical likelihood confidence intervals for assessing heterogeneity in meta-analysis of rare binary event data

Estimating group fixed effects in panel data with a binary dependent variable: How the LPM outperforms logistic regression in rare events data

Psychosocial Determinants of Burn-Related Suicide: Evidence From the National Violent Death Reporting System.

A Comparative Study of the Bias Correction Methods for Differential Item Functioning Analysis in Logistic Regression with Rare Events Data.

Precluding rare outcomes by predicting their absence.

Logistic Regression Procedure Using Penalized Maximum Likelihood Estimation for Differential Item Functioning

Assessing correlates of protection in vaccine trials: statistical solutions in the context of high vaccine efficacy