New computational and technological paradigms that currently guide developments in the information society, i.e., Internet of things, pervasive technology, or Ubicomp, favor the appearance of new intrusion vectors that can directly affect people’s daily lives. This, together with advances in techniques and methods used for developing new cyber-attacks, exponentially increases the number of cyber threats which affect the information society. Because of this, the development and improvement of technology that assists cybersecurity experts to prevent and detect attacks arose as a fundamental pillar in the field of cybersecurity. Specifically, intrusion detection systems are now a fundamental tool in the provision of services through the internet. However, these systems have certain limitations, i.e., false positives, real-time analytics, etc., which require their operation to be supervised. Therefore, it is necessary to offer architectures and systems that favor an efficient analysis of the data handled by these tools. In this sense, this paper presents a new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16. In addition, the paper analyzes the use of machine learning techniques in order to improve the response and efficiency of the proposed preprocessing model. Thus, the solution developed achieves good results in terms of computer performance. Finally, the proposal shows the adequateness of decision tree algorithms for training a machine learning model by using a large dataset when compared with a multilayer perceptron neural network.
Read full abstract