Abstract
In cybersecurity, the persistent challenge of spam detection remains paramount. Traditional methods reliant on human scrutiny or rule-based algorithms are proving inadequate against the constantly evolving tactics employed by spammers. Machine learning emerges as a promising solution, leveraging vast datasets to swiftly and objectively discern patterns and traits within spam messages. By uncovering subtle correlations among message elements, machine learning enhances the precision and efficacy of spam detection systems, offering a dependable and economical approach to combat spam. This study aims to investigate the impact of different strategies for addressing data imbalance on neural network-based spam detection performance. Using the SMS Spam Collection Dataset, four methods for mitigating data imbalance are evaluated against an untreated scenario. Notably, despite inherent data imbalance, the unprocessed scenario exhibits the highest overall performance. Stratified sampling emerges as the most effective technique for accurately identifying spam, while SMOTE excels in preserving legitimate messages (ham) while filtering out spam. These results contribute significantly to peoples understanding of the intricate dynamics in controlling data imbalance in spam detection and offer insightful information for future studies and real-world applications.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have