Abstract
The K-means clustering technique is widely used in many fields, such as anomaly detection, customer segmentation, cyber-physical system, medical diagnoses, sentiment analysis, fraud detection, and other similar tasks. We used this k-means technique in handling imbalanced datasets by preserving minority class structure using the stratified resampling technique. For this experimental study, we used a benchmark dataset from Kaggle. It is a labeled dataset collected from online social media regarding fake news. This proposed model, The Stratified k-means Sampling (SKMS), is compared with Synthetic Minority Oversampling Technique (SMOTE) by empirically experimenting using different machine learning algorithms. Random Forest (RF) algorithm gives significant accuracy, and Support Vector Classification (SVC) produces a better F1-score than other algorithms. The SMOTE technique was compared with the same dataset using these same algorithms. While SKMS seeks to preserve the structure of the minority class, SMOTE aims to diversify the minority class by interpolating between existing samples. Depending on the dataset, one might be more relevant than the other.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.