Forecasting cyberattacks with incomplete, imbalanced, and insignificant data

Ahmet Okutan,Shanchieh Jay Yang,Gordon Werner,Katie Mcconky

doi:10.1186/s42400-018-0016-5

Ahmet Okutan, Shanchieh Jay Yang + Show 2 more

Open Access

https://doi.org/10.1186/s42400-018-0016-5

Copy DOI

Journal: Cybersecurity	Publication Date: Dec 1, 2018
Citations: 17	License type: open-access

Affiliation: Rochester Institute of Technology

Abstract

Having the ability to forecast cyberattacks before they happen will unquestionably change the landscape of cyber warfare and cyber crime. This work predicts specific types of attacks on a potential victim network before the actual malicious actions take place. The challenge to forecasting cyberattacks is to extract relevant and reliable signals to treat sporadic and seemingly random acts of adversaries. This paper builds on multi-faceted machine learning solutions and develops an integrated system to transform large volumes of public data to aggregate signals with imputation that are relevant and predictive of cyber incidents. A comprehensive analysis of the individual parts and the integrated whole demonstrates the effectiveness and trade-offs of the proposed approach. Using 16-months of reported cyber incidents by an anonymized victim organization, the integrated approach achieves up to 87%, 90%, and 96% AUC for forecasting endpoint-malware, malicious-destination, and malicious-email attacks, respectively. When assessed month-by-month, the proposed approach shows robustness to perform consistently well, achieving F-Measure between 0.6 and 1.0. The framework also enables an examination of which unconventional signals are meaningful for cyberattack forecasting.

Highlights

The scale and diversity of cyberattacks have changed significantly in recent years, becoming a critical means for monetary gain, intellectual theft, and political agenda worldwide
We observe that the Weighted significant average based aggregation (WSAA)-t approach improves the classification performance for the Endpoint Malware (EM) and Malicious Email (ME) attack types
This work uses unconventional signals derived from Twitter, GDELT and Open Threat Exchange (OTX) open platforms, to predict cyberattacks towards a target organization anonymized as K9, for the endpoint-malware, malicious-destination, and malicious-email attack types

Summary

Introduction

The scale and diversity of cyberattacks have changed significantly in recent years, becoming a critical means for monetary gain, intellectual theft, and political agenda worldwide. Recent reports show that the number of cyberattacks continues to rise globally (PwC 2016), and the cost to society due to these attacks is expanding at a tremendous rate (Accenture Security 2017) Forecasting cyberattacks before they take place can offer great value, but is challenging because of the limited relevance one could find from albeit significant volume of everchanging and diverse ‘unconventional’ signals in social media, news, and other public forums. This paper tackles this challenge by developing an integrated system that treats the problems of incomplete signals, signals with varying significant lags, and imbalanced ground truth labels. The overall system is tested using the cyber incident data provided by an anonymized company nicknamed K9

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Forecasting cyberattacks with incomplete, imbalanced, and insignificant data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Cybersecurity

Lead the way for us

Similar Papers

CAPTURE: Cyberattack Forecasting Using Non-Stationary Features with Time Lags
Ahmet Okutan ... Gordon Werner
-
Ahmet Okutan, et. al.Ahmet Okutan ... Gordon Werner
01 Jun 2019
01 Jun 2019

Sampling environmental acoustic recordings to determine bird species richness
Jason Wimmer ... Ian Williamson
Ecological Applications | VOL. 23
Jason Wimmer, et. al.Jason Wimmer ... Ian Williamson
01 Sep 2013
Ecological Applications | VOL. 23

An initial study of predictive machine learning analytics on large volumes of historical data for power system applications
Jiang Zheng ... Aldo Dagnino
-
Jiang Zheng, et. al.Jiang Zheng ... Aldo Dagnino
01 Oct 2014
01 Oct 2014

Cyber Incidents Involving Control Systems
Robert J Turk
-
Robert J Turk Robert J Turk
01 Oct 2005
01 Oct 2005

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Forecasting cyberattacks with incomplete, imbalanced, and insignificant data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Cybersecurity