Oversampling method via adaptive double weights and Gaussian kernel function for the transformation of unbalanced data in risk assessment of cardiovascular disease

Congjun Rao,Xi Wei,Xinping Xiao,Yu Shi,Mark Goh

doi:10.1016/j.ins.2024.120410

Abstract

In risk assessment of cardiovascular disease (CVD), the classification error caused by unbalanced data is a significant challenge, which has sparked widespread concern and research upsurge in the field of data mining. Therefore, in view of the imbalance of CVD data sets, an oversampling method via adaptive double weights and Gaussian kernel function (ADWGKFO) is proposed, which converts the unbalanced data sets into balanced data sets. Firstly, clustering algorithm is utilized to cluster minority samples, boundary samples are identified by Borderline-Synthetic Minority Over-sampling Technique (Borderline-SMOTE), K nearest neighbor and support vector machine algorithms, and the number of samples synthesized in each group is calculated based on the double weights of boundary points and majority distribution. Secondly, in order to clearly define the classification boundary, the mutual class potential of new samples in each cluster is calculated by Gaussian kernel function, and new samples are filtered according to the mutual class potential until the data set is balanced. Finally, taking the data sets from Kaggle platform as the research samples, the proposed method is empirically analyzed. In order to validate the efficacy and universality of the proposed method, this paper selects CatBoost that is a new integrated algorithm to test the effect of the ADWGKFO method, and compares it with different sampling methods and different classifiers using performance evaluation indexes such as accuracy, F1-score and area under the curve (AUC). Compared with the combinations of other methods, the accuracy, F1-score and AUC are significantly improved. It is concluded that the ADWGKFO method described in this paper can successfully improve the data quality, and increases the reliability of CVD risk assessment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Oversampling method via adaptive double weights and Gaussian kernel function for the transformation of unbalanced data in risk assessment of cardiovascular disease

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Mar 2, 2024
Citations: 6

Similar Papers

The impact of a point-of-care testing device on CVD risk assessment completion in New Zealand primary-care practice: A cluster randomised controlled trial and qualitative investigation.
Sue Wells ... Kyle Eggleton
PLOS ONE | VOL. 12
Sue Wells, et. al.Sue Wells ... Kyle Eggleton
19 Apr 2017
PLOS ONE | VOL. 12

Using run charts for cardiovascular disease risk assessments in general practice.
Susan Wells ... Natasha Rafter
Journal of Primary Health Care | VOL. 8
Susan Wells, et. al.Susan Wells ... Natasha Rafter
01 Jan 2015
Journal of Primary Health Care | VOL. 8

Feasibility of cardiovascular disease risk assessments in rheumatology outpatient clinics: experiences from the nationwide NOCAR project
Eirik Ikdahl ... Grunde Wibetoe
RMD Open | VOL. 4
Eirik Ikdahl, et. al.Eirik Ikdahl ... Grunde Wibetoe
01 Oct 2018
RMD Open | VOL. 4

Risk factors and assessment for cardiovascular disease among HIV-positive patients attending a Nigerian tertiary hospital.
Ifeyinwa Dorothy Osegbe ... Oyetunji Olukayode Soriyan
Pan African Medical Journal | VOL. 23
Ifeyinwa Dorothy Osegbe, et. al.Ifeyinwa Dorothy Osegbe ... Oyetunji Olukayode Soriyan
01 Jan 2015
Pan African Medical Journal | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Oversampling method via adaptive double weights and Gaussian kernel function for the transformation of unbalanced data in risk assessment of cardiovascular disease

Abstract

Talk to us

Similar Papers

More From: Information Sciences