Mathematical Model of Intrusion Detection Based on Sequential Execution of Commands Applying Pagerank

Cesar Guevara,Lorena Zapata-Saavedra,Hugo Arias,Jairo Hidalgo,Dioselina Pimbosa Ortiz,Ivan Ramirez-Morales,Fernando Aguilar-Galvez,Lorena Chalco-Torres,Marco Yandún

doi:10.1007/978-3-030-20488-4_12

Abstract

Cybersecurity in networks and computer systems is a very important research area for companies and institutions around the world. Therefore, safeguarding information is a fundamental objective, because data is the most valuable asset of a person or company. Users interacting with multiple systems generate a unique behavioral pattern for each person (called digital fingerprint). This behavior is compiled with the interactions between the user and the applications, websites, communication equipment (PCs, mobile phones, tablets, etc.). In this paper the analysis of eight users with computers with a UNIX operating system, who have performed their tasks in a period of 2 years, is detailed. This data is the history of use in Shell sessions, which are sorted by date and token. With this information a mathematical model of intrusion detection based on time series behaviors is generated. To generate this model a data pre-processing is necessary, which it generates user sessions \( S_{m}^{u} \), where u identifies the user and m the number of sessions the user u has made. Each session \( S_{m}^{u} \) contains a sequence of execution of commands \( C\_n \), that is \( S_{m}^{u} = \{ C_{1} ,C_{2} ,C_{3} , \ldots ,C_{n} \} \), where n is the position in wich the C command was executed. Only 17 commands have been selected, which are the most used by each user u. In the creation of the mathematical model we apply the page Rank algorithm [1], the same that within a command execution session \( S_{m}^{u} \), determines which command \( C_{n} \) calls another command \( C_{n + 1} \), and determines which command is the most executed. For this study we will perform a model with sb subsequences of two commands, \( sb = \{ C_{n} ,C_{n + 1} \} \), where the algorithm is applied and we obtain a probability of execution per command defined by \( P(C_{n} ) \). Finally, a profile is generated for each of the users as a signal in time series, where maximum and minimum normal behavior is obtained. If any behavior is outside those ranges, it is determined as intrusive behavior, with a detection probability value. Otherwise, it is determined that the behavior is normal and can continue executing commands in a normal way. The results obtained in this model have shown that the proposal is quite effective in the testing phase, with an accuracy rate greater than 90% and a false positive rate of less than 4%. This shows that our model is effective and adaptable to the dynamic behavior of the user. On the other hand, a variability in the execution of user commands has been found to be quite high in periods of short time, but the proposed algorithm tends to adapt quite optimally.

Full Text