Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark

Hanane Chliah,Amal Battou,Adil Laoufi,Maryem Ait El Hadj

doi:10.14569/ijacsa.2023.0140496

Abstract

Over the past few decades, the volume of data has increased significantly in both scientific institutions and universities, with a large number of students enrolled and a high volume of related data. Furthermore, network traffic has increased with post-pandemic and the use of online learning. Therefore, processing network traffic data is a complex and challenging task that increases the possibility of intrusions and anomalies. Traditional security systems cannot deal with such high-speed and big data traffic. Real-time anomaly detection should be able to process data as quickly as possible to detect abnormal and malicious data. This paper proposes a hybrid approach consisting of supervised and unsupervised learning for anomaly detection based on the big data engine Apache Spark. Initially, the k-means algorithm was implemented in Sparks MLlib for clustering network traffic, then for each cluster, K-nearest neighbors algorithm (KNN) was implemented for classification and anomaly detection. The proposed model was trained and validated against a real dataset from Ibn Zohr University. The results indicate that the proposed model outperformed other well-known algorithms in detecting anomalies based on the aforementioned dataset. The experimental results show that the proposed hybrid approach can reach up to 99.94 % accuracy using the k-fold cross-validation method in the complete dataset with all 48 features.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark

Abstract

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2023
License type: cc-by

Similar Papers

The credibility assessment of Twitter/X users based organization objectives by heterogeneous resources in big data life cycle
Sogand Dehghan ... Shahriar Mohammadi
Computers in Human Behavior | VOL. 162
Sogand Dehghan, et. al.Sogand Dehghan ... Shahriar Mohammadi
03 Sep 2024
Computers in Human Behavior | VOL. 162

Pharmacy: Harnessing The Power Of Big Data
Amy K Erickson
Pharmacy Today | VOL. 20
Amy K EricksonAmy K Erickson
01 Nov 2014
Pharmacy Today | VOL. 20

An Overview of Big Data Security with Hadoop Framework
...
-
, et. al. ...
01 Jan 2017
01 Jan 2017

Presto-RDF: SPARQL Querying over Big RDF Data
Mulugeta Mammo ... Srividya K Bansal
-
Mulugeta Mammo, et. al.Mulugeta Mammo ... Srividya K Bansal
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark

Abstract

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications