Abstract
Information about cyberattacks that attackers plan to carry out against critical infrastructure facilities is partly distributed on malicious information сhannels, chats or sites. Investigation of information materials and their analysis can provide an understanding of the stages of attack planning and their prevention. Part of this problem is to provide information search and analysis tools to detect linguistic patterns, similarities in text data, which are capable of deanonymizing cybercriminals and establishing relationships between published data. This work proposes a new model and a corresponding prototype of the system, based on the vector space model and the TF-IDF algorithm. The system is designed to analyze publicly available text data (both internet and darknet), and differs with a probabilistic approach to analyzing the identifiers of the information publisher. The proposed system also focuses on identifying latent connections between anonymous accounts by analyzing unique stylistic and linguistic traits. It leverages these traits to trace patterns in communication, uncovering hidden associations among cybercriminal entities. Experiments conducted based on the analysis of real chats, including chats of cybercriminals, demonstrate the potential of the system for detecting identifiers and determining stylistic features. If a sufficiently complete set of data is available and a list of target words is available, it is possible to analyze the stages of preparing an attack, malicious individuals or groups involved in it. The results underline the significance of integrating advanced linguistic analysis techniques with probabilistic models to enhance investigative capabilities against evolving cyber threats.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have