Abstract
e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evaluated in the past, but the results show that the more research in this regard is required to enhance accuracy and to reduce training time and error rate. Thus, this research proposes a novel machine learning-based hybrid bagging method for e-mail spam identification by combining two machine learning methods: random forest and J48 (decision tree). The proposed framework categorizes the e-mail into ham and spam. The database is split into multiple sets and provided as input to each method in this procedure. Moreover, tokenization, stemming, and stop word removal are performed in the preprocessing stage. Further, correlation feature selection (CFS) is employed in this research to select the required features from the preprocessed data. The effectiveness of the presented method is evaluated in terms of true-negative rates, accuracy, recall, precision, false-positive rate, f-measure, and false-negative rate; the outcomes of three studies are compared. According to the results, the presented hybrid bagged model-based SMD technology achieved 98 percent accuracy.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.