Abstract

Emerging computing relies heavily on secure backend storage for the massive size of big data originating from the Internet of Things (IoT) smart devices to the Cloud-hosted web applications. Structured Query Language (SQL) Injection Attack (SQLIA) remains an intruder's exploit of choice to pilfer confidential data from the back-end database with damaging ramifications. The existing approaches were all before the new emerging computing in the context of the Internet big data mining and as such will lack the ability to cope with new signatures concealed in a large volume of web requests over time. Also, these existing approaches were strings lookup approaches aimed at on-premise application domain boundary, not applicable to roaming Cloud-hosted services' edge Software-Defined Network (SDN) to application endpoints with large web request hits. Using a Machine Learning (ML) approach provides scalable big data mining for SQLIA detection and prevention. Unfortunately, the absence of corpus to train a classifier is an issue well known in SQLIA research in applying Artificial Intelligence (AI) techniques. This paper presents an application context pattern-driven corpus to train a supervised learning model. The model is trained with ML algorithms of Two-Class Support Vector Machine (TC SVM) and Two-Class Logistic Regression (TC LR) implemented on Microsoft Azure Machine Learning (MAML) studio to mitigate SQLIA. This scheme presented here, then forms the subject of the empirical evaluation in Receiver Operating Characteristic (ROC) curve.

Highlights

  • Recent years have seen a continuous upward trend in big internet data, and the volume of the Cloud-driven applications will only continue to grow with more individuals, governments and businesses adopting and hosting files and applications in the Cloud

  • We present in this article a supervised learning model that uses a data set input from patterns of expected input data, including SQLIA types and Structured Query Language (SQL) keywords to train various classifiers with a better performance metrics trained model deployed as Web Service (WS)

  • Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC) are widely used by data scientists to measure the performance metrics in Machine Learning (ML) analytics

Read more

Summary

INTRODUCTION

Recent years have seen a continuous upward trend in big internet data, and the volume of the Cloud-driven applications will only continue to grow with more individuals, governments and businesses adopting and hosting files and applications in the Cloud. The SQLIA problem is a plausible candidate to apply predictive analytics employing a supervised learning model trained with historical attack signatures, including SQL tokens and safe web requests patterns to predict SQLIA at SQL query injection points. Web application domain context is so diverse to have a standardised pre-existing pattern-driven data set to train a supervised learning model. As such, non-availability of a data set that covers every application domain context is an issue well known in SQLIA research in applying AI techniques. We opine patterns exist in every input data in both legacy and new web applications that can be leveraged to generate as many derivations of member strings. In our labelling of the data set, the presence of known attack signature at injection points will contain patterns of SQL tokens and symbols which are deemed SQLIA positive.

RELATED WORK
BACKGROUND
SQL Language structure and injection point
A high-level overview of the experimental steps
Publishing and consuming the prediction web service
EVALUATION AND PERFORMANCE METRICS
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.