Protecting big data has become an extremely vital necessity in the context of cybersecurity, given the significant impact that this data has on institutions and clients. The importance of this type of data is highlighted as a basis for decision-making processes and policy guidance. Therefore, attacks on this data can lead to serious losses through illicit access, resulting in a loss of integrity, reliability, confidentiality, and availability of this data. The second problem in this context arises from the necessity of reducing the attack detection period and its vital importance in classifying malicious and non-harmful patterns. Structured Query Language Injection Attack (SQLIA) is among the common attacks targeting data, which is the focus of interest in the proposed model. The aim of this research revolves around developing an approach aimed at detecting and distinguishing patterns of loads sent by the user. The proposed method is based on training a model using random forest technology, which is considered one of the machine learning (ML) techniques while taking advantage of the Spark ML library that interacts effectively with big data frameworks. This is accompanied by a comprehensive analysis of the effectiveness of ML techniques in monitoring and detecting SQLIA. The study was conducted using the SQL dataset available on the Kaggle platform and showed promising results as the proposed method achieved an accuracy of 98.12%. While the proposed approach takes 0.046 seconds to determine the SQL type. It is concluded from these results that using the Spark ML library based on ML techniques contributes to achieving higher accuracy and requires less time to identify the class of request sent due to its ability to be distributed in memory.
Read full abstract