Oversampling Method Research Articles

ObjectivesTo develop and to propose a machine learning model for predicting glaucoma and identifying its risk factors.MethodData analysis pipeline is designed for this study based on Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. The main steps of the pipeline include data sampling, preprocessing, classification and evaluation and validation. Data sampling for providing the training dataset was performed with balanced sampling based on over-sampling and under-sampling methods. Data preprocessing steps were missing value imputation and normalization. For classification step, several machine learning models were designed for predicting glaucoma including Decision Trees (DTs), K-Nearest Neighbors (K-NN), Support Vector Machines (SVM), Random Forests (RFs), Extra Trees (ETs) and Bagging Ensemble methods. Moreover, in the classification step, a novel stacking ensemble model is designed and proposed using the superior classifiers.ResultsThe data were from Shahroud Eye Cohort Study including demographic and ophthalmology data for 5190 participants aged 40-64 living in Shahroud, northeast Iran. The main variables considered in this dataset were 67 demographics, ophthalmologic, optometric, perimetry, and biometry features for 4561 people, including 4474 non-glaucoma participants and 87 glaucoma patients. Experimental results show that DTs and RFs trained based on under-sampling of the training dataset have superior performance for predicting glaucoma than the compared single classifiers and bagging ensemble methods with the average accuracy of 87.61 and 88.87, the sensitivity of 73.80 and 72.35, specificity of 87.88 and 89.10 and area under the curve (AUC) of 91.04 and 94.53, respectively. The proposed stacking ensemble has an average accuracy of 83.56, a sensitivity of 82.21, a specificity of 81.32, and an AUC of 88.54.ConclusionsIn this study, a machine learning model is proposed and developed to predict glaucoma disease among persons aged 40-64. Top predictors in this study considered features for discriminating and predicting non-glaucoma persons from glaucoma patients include the number of the visual field detect on perimetry, vertical cup to disk ratio, white to white diameter, systolic blood pressure, pupil barycenter on Y coordinate, age, and axial length.

Unplanned events present significant challenges for operations and management in metro systems. Short-term ridership prediction can help agencies to better design contingency strategies under unplanned events. Though many short-term prediction methods have been proposed in the literature, most studies focused on typical situations or planned events. The study develops methods for the short-term metro ridership prediction under unplanned events. It explores event impact representation mechanisms and deals with the imbalanced data training problem in building the prediction model under unplanned events. Typical machine learning and deep learning methods are developed for exploration. A large-scale automatic fare collection (AFC) dataset and event record data for a heavily used metro system are used for empirical studies. The analysis found that the same type of unplanned event shares a similar and consistent demand change pattern (with respect to the demand under typical situations) at the station level. The synthetic minority oversampling technique (SMOTE) can enrich the ridership observations under unplanned events and generate a balanced dataset for model training. Given the occurrence of unplanned events, the results show that a combination of demand change ratio and the SMOTE oversampling technique enables the prediction models to learn the impact of unplanned events and improve the prediction accuracy under unplanned events. However, the oversampling methods (i.e., SMOTE and replication) slightly deteriorate the prediction accuracy for ridership under normal conditions. The findings provide insights into mechanisms for disruption impact representation and oversampling imbalanced data in model training, and guide the development of models for short-term prediction under unplanned events.

Oversampling Method Research Articles

Related Topics

Articles published on Oversampling Method

A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

Optimization of software defects prediction in imbalanced class using a combination of resampling methods with support vector machine and logistic regression

Distance-based Probabilistic Data Augmentation for Synthetic Minority Oversampling

Optimization and comparison of models for core temperature prediction of mother rabbits using infrared thermography

Binary imbalanced data classification based on diversity oversampling by generative models

Development of glaucoma predictive model and risk factors assessment based on supervised models

LDAS: Local density-based adaptive sampling for imbalanced data classification

An oversampling method for multi-class imbalanced data based on composite weights.

An improved learning-based LSTM approach for lane change intention prediction subject to imbalanced data

Exploring ensemble oversampling method for imbalanced keyword extraction learning in policy text based on three-way decisions and SMOTE

A new instance density-based synthetic minority oversampling method for imbalanced classification problems

Performance Analysis of Two-Stage Iterative Ensemble Method over Random Oversampling Methods on Multiclass Imbalanced Datasets

Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification.

Oversampling the minority class using a dedicated fitness function and genetic algorithmic progression

A Safe Zone SMOTE Oversampling Algorithm Used in Earthquake Prediction Based on Extreme Imbalanced Precursor Data

Improvising Balancing Methods for Classifying Imbalanced Data

Anovel HEOMGA Approach for Class Imbalance Problem in the Application of Customer Churn Prediction

Improvement of the Classification Performance of an Intrusion Detection Model for Rare and Unknown Attack Traffic

Short-Term Metro Ridership Prediction During Unplanned Events

Comparing Machine Learning Methods to Improve Fall Risk Detection in Elderly with Osteoporosis from Balance Data.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Oversampling Method Research Articles

Related Topics

Articles published on Oversampling Method

A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

Optimization of software defects prediction in imbalanced class using a combination of resampling methods with support vector machine and logistic regression

Distance-based Probabilistic Data Augmentation for Synthetic Minority Oversampling

Optimization and comparison of models for core temperature prediction of mother rabbits using infrared thermography

Binary imbalanced data classification based on diversity oversampling by generative models

Development of glaucoma predictive model and risk factors assessment based on supervised models

LDAS: Local density-based adaptive sampling for imbalanced data classification

An oversampling method for multi-class imbalanced data based on composite weights.

An improved learning-based LSTM approach for lane change intention prediction subject to imbalanced data

Exploring ensemble oversampling method for imbalanced keyword extraction learning in policy text based on three-way decisions and SMOTE

A new instance density-based synthetic minority oversampling method for imbalanced classification problems

Performance Analysis of Two-Stage Iterative Ensemble Method over Random Oversampling Methods on Multiclass Imbalanced Datasets

Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification.

Oversampling the minority class using a dedicated fitness function and genetic algorithmic progression

A Safe Zone SMOTE Oversampling Algorithm Used in Earthquake Prediction Based on Extreme Imbalanced Precursor Data

Improvising Balancing Methods for Classifying Imbalanced Data

Anovel HEOMGA Approach for Class Imbalance Problem in the Application of Customer Churn Prediction

Improvement of the Classification Performance of an Intrusion Detection Model for Rare and Unknown Attack Traffic

Short-Term Metro Ridership Prediction During Unplanned Events

Comparing Machine Learning Methods to Improve Fall Risk Detection in Elderly with Osteoporosis from Balance Data.