Abstract

ABSTRACTSpam, also known as unsolicited bulk e-mail (UBE), has recently become a serious threat that negatively impacts the usability of legitimate mails. In this article, an evidential spam-filtering framework is proposed. As a useful tool to handle uncertainty, the Dempster–Shafer theory of evidence (D–S theory) is integrated into the proposed approach. Five representative features from an e-mail header are analyzed. With a machine-learning algorithm, e-mail headers with known classifications are used to train the framework. When using the framework for a given e-mail header, its representative features are quantified. Although in classical probability theory, possibilities are forcedly assigned even when information is not adequate, in our approach, for every word in an e-mail subject, basic probability assignments (BPA) are assigned in a more flexible way, thus providing a more reasonable result. Finally, BPAs are combined and transformed into pignistic probabilities for decision-making. Empirical trials on real-world datasets show the efficiency of the proposed framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call