Detecting Anomalies and Intrusions in Unstructured Cybersecurity Data Using Natural Language Processing

Tamilselvan Arjunan

doi:10.22214/ijraset.2024.58497

Abstract

Abstract: Due to the growing volume and variety in data generated by cybersecurity systems, it is crucial that unstructured text be used for detecting anomalies. Natural language processing is a powerful tool for analyzing unstructured information and identifying threats. This paper presents a comprehensive review of NLP applications for cybersecurity. We first present the motivations for and challenges associated with using NLP to improve cybersecurity. Then, we provide background information on unstructured data that is relevant to cybersecurity, and discuss NLP techniques such as named entity recognition (NEAR), sentiment analysis, topic modelling, and document classifying. This paper focuses on how these techniques are used to detect anomalies and intrusions. We present a taxonomy for NLP-driven approaches, and we conduct a literature review that is categorized according to this taxonomy. We examine critically the strengths and weaknesses of current techniques. We highlight research gaps based on this analysis and propose a research agenda to advance NLP research in cybersecurity applications. This paper summarizes previous research and lays the foundation for using NLP to tackle cybersecurity challenges that involve unstructured data.

Full Text