Structured query language (SQL) has emerged as one of the most used databases, serving an array of Internet-of-Things (IoTs)-enabled services including web-transactions, grid networks, industrial activity log and proactive decision systems, smart-home, financial transactions, business communication etc. With high pace increase in SQL-driven IoT applications, the threat of SQL-injection attacks (SQLIAs) at the middleware layer has increased significantly. To address such issues, machine learning-based SQLIA-prediction systems are proposed; however, majority of the existing methods are found limited in terms of intrusion detection accuracy because of their complete-reliance on structural features and inferior learning model(s). On the contrary, intruders these days intrude the system by mimicking the normal queries and hence confuses most of the classical learning-based methods. To alleviate such problems, this article emphasizes on exploiting semantic features along with the state-of-art highly robust computing environment. We proposed a robust semantic query-featured ensemble learning model for SQLIA prediction. Unlike classical (query's) template-matching or term-assessment-based methods, our proposed SQLIA-prediction model exploits latent semantic features from large SQL-queries to train an ensemble learning model that classifies each query as the normal query or the SQLIA query. Functionally, it performs preprocessing over large set of SQL-queries using count-vectorizer and stopping word removal. Subsequently, it applies Word2Vec feature extraction method over each query using continuous bag of words (CBOW) and N-skip gram (SKG) algorithms, which obtained CBOW and SKG semantic features from each SQL-query. The extracted features were processed for data resampling so as to alleviate the problem of class-imbalance and skewness. To alleviate redundant computation, two feature selection algorithms named Mann-Whitney significance predictor test and principal component analysis were applied over the resampled features. Moreover, to eliminate over-fitting and convergence problem, Min-Max normalization was performed over the selected features which were later processed for learning using a state-of-art robust heterogeneous ensemble learning model. Unlike standalone classifier-based SQLIA, the proposed learning-model employed a set of nine base classifiers designed to serve maximum voting ensemble-based prediction. The proposed ensemble-learning method classified each SQL-query as the normal-query or the SQLIA-query. Simulation results affirmed superiority of the proposed SQLIA prediction model in terms of accuracy (98%), F-Score (0.989), AUC (0.999) signifying its efficacy toward real-world SQL-driven IoT-ecosystems.
Read full abstract