Two-step email spam detection: comparing machine and deep learning accuracy
Artificial intelligence (AI) continues to be a transformative field, offering significant contributions to data science by supporting optimal decision-making processes. One notable application of AI is in digital forensics, particularly in spam email classification. This paper presents a two-step approach to differentiate between regular and spam emails. In the first step, emails are evaluated for vulnerabilities based on three key criteria: varying time intervals between Mail Transfer Agents (MTA), the presence of binary attachments, and inconsistencies in IP addresses associated with the same user. In the second step, a comparative study is conducted between Machine Learning (ML) and Deep Learning (DL) algorithms to identify the most effective method for achieving accurate classification results. The findings demonstrate that the Support Vector Machine (SVM) algorithm from ML outperforms the Recurrent Neural Network (RNN) algorithm from DL, achieving an accuracy rate of 96 % compared to 90 %. A notable conclusion from this research is that manual pre-processing leads to more accurate results and better interpretability compared to automatic pre-processing. This highlights the importance of human intervention in certain stages of AI-driven processes, even when using advanced algorithms. The results suggest that a combination of strategic criteria evaluation and algorithm selection is essential for enhancing the precision of spam classification in digital forensics.
1138
- 10.1145/1124772.1124861
- Apr 22, 2006
57
- 10.1109/iccons.2018.8662957
- Jun 1, 2018
41
- 10.1109/cybersecpods.2019.8885143
- Jun 1, 2019
6
- 10.5220/0008119805290534
- Jan 1, 2019
1
- 10.1051/ro/2015057
- Oct 1, 2016
- RAIRO - Operations Research
869
- 10.1145/238386.238530
- Jan 1, 1996
2633
- 10.1162/106454699568728
- Apr 1, 1999
- Artificial Life
11
- 10.1007/978-981-13-0224-4_13
- Jul 10, 2018
10
- 10.4018/ijssci.2021100103
- Oct 1, 2021
- International Journal of Software Science and Computational Intelligence
1
- 10.1109/icmla.2018.00156
- Dec 1, 2018
- Research Article
2
- 10.21271/zjpas.34.2.3
- Apr 12, 2022
- ZANCO JOURNAL OF PURE AND APPLIED SCIENCES
Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
- Conference Article
4
- 10.1109/icite54466.2022.9759865
- Jan 22, 2022
A large number of email users triggers an increase in the occurrence of spam in emails to gain benefits for some parties but harm others and also email users. Spam emails usually contain advertisements or criminal acts such as phishing which implicitly contain human emotions in them. It is quite difficult and takes time to differentiate between a large number of spam and ham emails. This problem can be overcome by using deep learning technology. One of which is a neural network that can classify spam emails. This paper uses the spam and ham Enron email corpus dataset. This study will add emotional features in extracting its features. The steps taken include text preprocessing, feature extraction using tf-idf, and lexicon-based emotion features, followed by classification using RNN to detect spam in emails. A comparison with other methods is also provided by comparing the proposed method to Naïve Bayes and Support-Vector Machine (SVM) algorithm based on precision and accuracy. In addition, this study also compares the effect of using affect intensities on the performance of algorithms. The results show that RNN outperforms other methods by showing the highest accuracy 99% and the precision of 99.1%. Adding effect intensities to the model would increase the model recognition results.
- Conference Article
2
- 10.1109/iciem54221.2022.9853177
- Apr 27, 2022
The objective of this work is to compare the Recurrent Neural Network (RNN) algorithm and Support Vector Machine (SVM) algorithm in the identification of endometrial cancer based on its accuracy and sensitivity measurements. Materials and Methods: The endometrial cancer dataset, obtained from the National Institute of Endometrial Cancer Diseases (NIECE), contains 768 patient health records that were used to train (80 %) and test (20 %) the predictive model in MATLAB and the statistical analysis is done using SPSS software. For this research work 768 images were used with the pixel size of 3048×2048 and these images are taken from the pap smear slide dataset. The RNN algorithm is used and compared with the SVM algorithm. The sample size is estimated for two groups (RNN & SVM) with G-power of 80 % and 0.05 Type I/II Error rate (Alpha). Results: The predictive model using RNN algorithm shows a higher accuracy of 93.90 ± 0.3160 and sensitivity of 91.0400 ± 1.07207 followed by the significance value of 0.002 than SVM algorithm with accuracy of 88.10 ± 0.9940 and sensitivity of 86.1700 ± 1.36793 with the significance value of 0.000 using 2-tailed test in SPSS. Conclusion: Based on the outcome of the proposed work RNN classifier shows significantly better performance than the SVM classifier in the innovative detection of endometrial cancer.
- Research Article
3
- 10.16984/saufenbilder.1264476
- Apr 30, 2024
- Sakarya University Journal of Science
Electronic Electronic messages, i.e. e-mails, are a communication tool frequently used by individuals or organizations. While e-mail is extremely practical to use, it is necessary to consider its vulnerabilities. Spam e-mails are unsolicited messages created to promote a product or service, often sent frequently. It is very important to classify incoming e-mails in order to protect against malware that can be transmitted via e-mail and to reduce possible unwanted consequences. Spam email classification is the process of identifying and distinguishing spam emails from legitimate emails. This classification can be done through various methods such as keyword filtering, machine learning algorithms and image recognition. The goal of spam email classification is to prevent unwanted and potentially harmful emails from reaching the user's inbox. In this study, Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM) and Artificial Neural Network (ANN) algorithms are used to classify spam emails and the results are compared. Algorithms with different approaches were used to determine the best solution for the problem. 5558 spam and non-spam e-mails were analyzed and the performance of the algorithms was reported in terms of accuracy, precision, sensitivity and F1-Score metrics. The most successful result was obtained with the RF algorithm with an accuracy of 98.83%. In this study, high success was achieved by classifying spam emails with machine learning algorithms. In addition, it has been proved by experimental studies that better results are obtained than similar studies in the literature.
- Research Article
- 10.1093/eurheartj/ehab724.3069
- Oct 12, 2021
- European Heart Journal
Background Thrombolysis in Myocardial infarction (TIMI) is used in predicting the mortality rate of the acute coronary syndrome (ACS) patients. TIMI was developed based on the Western cohort with limited data on the Asian cohort. There are separate TIMI scores for STEMI and NSTEMI. Deep learning (DL) and machine learning (ML) algorithms such as support vector machine (SVM) in population-specific dataset resulted in a higher area under the curve (AUC) to TIMI. The limitation of DL is selected features by the algorithm is unknown compared to ML algorithms. Purpose To construct a single in-hospital mortality risk scoring system that combines SVM feature importance and the DL algorithm in ASIAN patients with ACS that is applicable for both STEMI and NSTEMI patients. To investigate DL performance constructed using predictors selected from SVM feature extraction and DL using complete features and compare with TIMI risk score for STEMI and NSTEMI patients. Methods We constructed four algorithms: i) DL and SVM algorithm with feature selected from SVM variable importance, ii) DL and SVM algorithm without feature selection. SVM feature importance with the backward elimination method is used to select and rank important variables. We used registry data from the National Cardiovascular Disease Database of 13190 patient's data. Fifty-four parameters including demographics, cardiovascular risk, medications and clinical variables were considered. AUC was used as the performance evaluation metric. All algorithms were validated using validation dataset and compared to the conventional TIMI for STEMI and NSTEMI. Results Validation results in Figure 1 are by STEMI and NTEMI patients. Both DL algorithms outperformed ML and TIMI score on validation data. Similar performance is observed for DL and SVM algorithms using all predictors (54 predictors) with DL and SVM algorithm using selected predictors (14 predictors). Predictors selected by the SVM feature selection are: age, heart rate, Killip class, fasting blood glucose, ST-elevation, CABG, cardiac catheterization, angina episode, HDLC, LDC, other lipid-lowering agents, statin, anti-arrhythmic agent, oralhypogly. CABG and pharmacotherapy drugs as selected predictors improve mortality prediction compared to TIMI score. In DL, 25.87% of STEMI patients and 19.71% of NSTEMI patients are estimated as high risk (risk probabilities of >50%). TIMI underestimated the risk of mortality of high-risk patients (≥5 risk scores) with 13.08% from STEMI patients and 4.65% from NSTEMI patients (Figure 2). Conclusions In the ASIAN multi-ethnicity population, patients with ACS can be better classified using one single algorithm compared to the conventional method like TIMI which requires two different scores. Combining ML feature selection with DL allows the identification of distinct factors related to in-hospital mortality of ACS patients in a unique ASIAN population for better mortality prediction. Funding Acknowledgement Type of funding sources: Public grant(s) – National budget only. Main funding source(s): Technology Development Fund 1 Figure 1. Performance resultsFigure 2. Analysis on the validation set
- Research Article
15
- 10.1097/corr.0000000000001679
- Feb 17, 2021
- Clinical orthopaedics and related research
CORR Synthesis: When Should the Orthopaedic Surgeon Use Artificial Intelligence, Machine Learning, and Deep Learning?
- Research Article
9
- 10.1111/ajo.13661
- Apr 1, 2023
- Australian and New Zealand Journal of Obstetrics and Gynaecology
Artificial intelligence: Friend or foe?
- Book Chapter
- 10.1108/s1548-643520230000020016
- Mar 13, 2023
Index
- Research Article
16
- 10.2144/fsoa-2022-0010
- Mar 8, 2022
- Future science OA
Artificial intelligence in interdisciplinary life science and drug discovery research.
- Research Article
- 10.11648/j.jccee.20251002.12
- Mar 11, 2025
- Journal of Civil, Construction and Environmental Engineering
The rapid advancement of aerospace technology, coupled with the exponential growth in available data, has catalyzed the integration of artificial intelligence (AI) across the aerospace sector. This comprehensive review examines the state-of-the-art applications of AI, machine learning (ML), deep learning (DL), and generative artificial intelligence (GenAI) in aerospace. Our analysis reveals that ML algorithms demonstrate remarkable capabilities: Random forest (RF) algorithm achieves precision within 10 meters for trajectory prediction, while support vector machines (SVMs) algorithms show 99.89% accuracy in aircraft fault detection. Decision trees (DTs) algorithms excel in aircraft system diagnostics with adaptive learning capabilities. In the realm of deep learning, convolutional neural networks (CNNs) algorithms achieve 79% accuracy in satellite component detection and structural inspection, while recurrent neural networks (RNNs) algorithms and Long Short-Term Memory (LSTM) networks demonstrate superior performance in 4D trajectory prediction and engine health monitoring. GenAI, particularly through Generative adversarial networks (GANs), has revolutionized airfoil design optimization, achieving less than 1% error in profile fitting and 10% error in aerodynamic stealth characteristics. However, these algorithms face scalability challenges when processing large-scale datasets in real-time applications, particularly in mission-critical scenarios. Our research also identifies four ethical considerations, including bias prevention in automated systems, transparency in decision-making processes, privacy protection in data handling, and the implementation of important safety protocols. This study provides a foundation for understanding the current landscape of aerospace-AI integration while highlighting the importance of addressing ethical implications in future developments. The successful implementation of these technologies will require continuous innovation in validation methodologies, establish universal ethical considerations standard, and enhanced community engagement through citizen science initiatives to involve stakeholders.
- Research Article
6
- 10.33166/aetic.2022.03.003
- Jul 1, 2022
- Annals of Emerging Technologies in Computing
Attacks against computer networks, “cyber-attacks”, are now common place affecting almost every Internet connected device on a daily basis. Organisations are now using machine learning and deep learning to thwart these types of attacks for their effectiveness without the need for human intervention. Machine learning offers the biggest advantage in their ability to detect, curtail, prevent, recover and even deal with untrained types of attacks without being explicitly programmed. This research will show the many different types of algorithms that are employed to fight against the different types of cyber-attacks, which are also explained. The classification algorithms, their implementation, accuracy and testing time are presented. The algorithms employed for this experiment were the Gaussian Naïve-Bayes algorithm, Logistic Regression Algorithm, SVM (Support Vector Machine) Algorithm, Stochastic Gradient Descent Algorithm, Decision Tree Algorithm, Random Forest Algorithm, Gradient Boosting Algorithm, K-Nearest Neighbour Algorithm, ANN (Artificial Neural Network) (here we also employed the Multilevel Perceptron Algorithm), Convolutional Neural Network (CNN) Algorithm and the Recurrent Neural Network (RNN) Algorithm. The study concluded that amongst the various machine learning algorithms, the Logistic Regression and Decision tree classifiers all took a very short time to be implemented giving an accuracy of over 90% for malware detection inside various test datasets. The Gaussian Naïve-Bayes classifier, though fast to implement, only gave an accuracy between 51-88%. The Multilevel Perceptron, non-linear SVM and Gradient Boosting algorithms all took a very long time to be implemented. The algorithm that performed with the greatest accuracy was the Random Forest Classification algorithm.
- Research Article
7
- 10.1155/2023/6675523
- Dec 4, 2023
- Journal of Engineering
Sensible and judicious utilization of water for agriculture in conjunction with prediction techniques increases the crop yield. The Ethiopian economy relies on and is exclusively dependent on agricultural-based activities. Different soil compositions (nitrogen, phosphorous, and potassium), crop alternation, soil dampness, and climate conditions play an imperative contribution in cultivation. The primary purpose of this study was to conduct a machine learning approach which can be practiced dynamically for efficient farming at a low cost. The support vector machine (SVM) was applied as a machine learning procedure, whereas long short-term memory (LSTM) and the recurrent neural network (RNN) were considered as deep learning procedures. The research comprised a model that is combined with machine learning procedures (ANN, random forest, and decision tree) to know efficient and appropriate crop types. The planned model is improved through conducting deep learning methods incorporated to the existing practice for different crop condition. Pure data and related evidence are attained concerning the quantities of soil constituents desired through their expenditures distinctly. It delivers well precision as compared to the current model examining the specified documents and assisting the local agronomists in forecasting different types of crop and gain benefits. In RNN, LSTM, and SVM algorithms, the accuracy is determined as 96% which is comparatively preferable as compared to other machine learning procedures under different feature and crop types. The techniques are evaluated in terms of percentage in prediction accuracy. The results generated are important for agrarians, experts, researchers, and local farmers to maximize the crop productivity and help to enhance agriculture and climate change-related decisions, especially in low-to-middle-income countries.
- Research Article
4
- 10.54097/hset.v39i.6640
- Apr 1, 2023
- Highlights in Science, Engineering and Technology
With the rising number of spam email, the need of more sufficient antispam filter is surging. Phishing attack can lead to extremely large losses of companies and individual, even more than 1 billion dollars in one year. This paper investigates and combines Naïve Bayes Classification and clustering algorithm in the application of identifying spam emails. With sample emails to create a dynamic dictionary containing most frequent words in spam and normal emails, this distribution of spam filter will provide a stricter method to prevent spam emails than those methods used in mail companies, e.g., Google, Yahoo, and Outlook.com. Besides, this paper also compares several algorithms used today in classifying spams and the future techniques of deep learning and machine learning’s application in classifying spam emails. According to the analysis, Google’s algorithm has the most comprehensive function, but such algorithm has less strict rule than Yahoo’s. Outlook.com, as a combination of Microsoft application, it has a unique algorithm for encrypting and filtering spams. Overall, these results shed light on guiding further exploration of both comprehensive and strict rule for classifying spams.
- Research Article
134
- 10.1016/j.matt.2020.04.019
- May 20, 2020
- Matter
Using Deep Learning to Predict Fracture Patterns in Crystalline Solids
- Research Article
6
- 10.1111/gcb.16696
- Apr 2, 2023
- Global Change Biology
Unlocking the power of machine learning for Earth system modeling: A game-changing breakthrough.
- Research Article
- 10.46904/eea.25.73.2.1108011
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108004
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108001
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108005
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108010
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108002
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108009
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108006
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108007
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Research Article
- 10.46904/eea.25.73.2.1108008
- May 30, 2025
- Electrotehnica, Electronica, Automatica
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.