Abstract
Having the ability to forecast cyberattacks before they happen will unquestionably change the landscape of cyber warfare and cyber crime. This work predicts specific types of attacks on a potential victim network before the actual malicious actions take place. The challenge to forecasting cyberattacks is to extract relevant and reliable signals to treat sporadic and seemingly random acts of adversaries. This paper builds on multi-faceted machine learning solutions and develops an integrated system to transform large volumes of public data to aggregate signals with imputation that are relevant and predictive of cyber incidents. A comprehensive analysis of the individual parts and the integrated whole demonstrates the effectiveness and trade-offs of the proposed approach. Using 16-months of reported cyber incidents by an anonymized victim organization, the integrated approach achieves up to 87%, 90%, and 96% AUC for forecasting endpoint-malware, malicious-destination, and malicious-email attacks, respectively. When assessed month-by-month, the proposed approach shows robustness to perform consistently well, achieving F-Measure between 0.6 and 1.0. The framework also enables an examination of which unconventional signals are meaningful for cyberattack forecasting.
Highlights
The scale and diversity of cyberattacks have changed significantly in recent years, becoming a critical means for monetary gain, intellectual theft, and political agenda worldwide
We observe that the Weighted significant average based aggregation (WSAA)-t approach improves the classification performance for the Endpoint Malware (EM) and Malicious Email (ME) attack types
This work uses unconventional signals derived from Twitter, GDELT and Open Threat Exchange (OTX) open platforms, to predict cyberattacks towards a target organization anonymized as K9, for the endpoint-malware, malicious-destination, and malicious-email attack types
Summary
The scale and diversity of cyberattacks have changed significantly in recent years, becoming a critical means for monetary gain, intellectual theft, and political agenda worldwide. Recent reports show that the number of cyberattacks continues to rise globally (PwC 2016), and the cost to society due to these attacks is expanding at a tremendous rate (Accenture Security 2017) Forecasting cyberattacks before they take place can offer great value, but is challenging because of the limited relevance one could find from albeit significant volume of everchanging and diverse ‘unconventional’ signals in social media, news, and other public forums. This paper tackles this challenge by developing an integrated system that treats the problems of incomplete signals, signals with varying significant lags, and imbalanced ground truth labels. The overall system is tested using the cyber incident data provided by an anonymized company nicknamed K9
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.