Abstract

Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external attacks. Most existing approaches in the field of insider threat focused on detecting general insider attack scenarios. However, insider attacks can be carried out in different ways, and the most dangerous one is a data leakage attack that can be executed by a malicious insider before his/her leaving an organization. This paper proposes a machine learning-based model for detecting such serious insider threat incidents. The proposed model addresses the possible bias of detection results that can occur due to an inappropriate encoding process by employing the feature scaling and one-hot encoding techniques. Furthermore, the imbalance issue of the utilized dataset is also addressed utilizing the synthetic minority oversampling technique (SMOTE). Well known machine learning algorithms are employed to detect the most accurate classifier that can detect data leakage events executed by malicious insiders during the sensitive period before they leave an organization. We provide a proof of concept for our model by applying it on CMU-CERT Insider Threat Dataset and comparing its performance with the ground truth. The experimental results show that our model detects insider data leakage events with an AUC-ROC value of 0.99, outperforming the existing approaches that are validated on the same dataset. The proposed model provides effective methods to address possible bias and class imbalance issues for the aim of devising an effective insider data leakage detection system.

Highlights

  • Nowadays, with the widespread usage of technology to perform many sensitive activities of an organization, security and privacy threats have increased considerably.Among these threats, insider attacks are the most dangerous and costliest types of attacks.Insider attacks are malicious acts performed by users who have authorized access to an organization’s information system

  • When the label encoding and one-hot encoding methods are utilized, the results show that the detection performance was increased with respect to one-hot encoding method compared to the label encoding method in all the four applied classifiers Logistic regression (LR), Decision tree (DT), Random forest (RF) and K-nearest neighbors (KNN) with Area under Curve- Receiver Operating Characteristic Curve (AUC-ROC) values of 0.58, 0.88, 0.66 and 0.53, respectively

  • The LR, DT, RF, Naive Bayes (NB), KNN and Kernel SVM (KSVM) machine learning algorithms are trained on the benchmarking Computer Emergency and Response Team (CERT) dataset to detect insider data leakage events on unseen data

Read more

Summary

Introduction

With the widespread usage of technology to perform many sensitive activities of an organization, security and privacy threats have increased considerably.Among these threats, insider attacks are the most dangerous and costliest types of attacks.Insider attacks are malicious acts performed by users who have authorized access to an organization’s information system. With the widespread usage of technology to perform many sensitive activities of an organization, security and privacy threats have increased considerably. Among these threats, insider attacks are the most dangerous and costliest types of attacks. Insider attacks are malicious acts performed by users who have authorized access to an organization’s information system. Such characteristics of an authorization have made the threats caused by insiders very difficult to detect. Overlooking such threats may lead an organization to lose its reputation and business goals.

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call