Abstract

In today’s world, technology has become deep-rooted and more accessible than ever over a plethora of different devices and platforms, ranging from company servers and commodity PCs to mobile phones and wearables, interconnecting a wide range of stakeholders such as households, organizations and critical infrastructures. The sheer volume and variety of the different operating systems, the device particularities, the various usage domains and the accessibility-ready nature of the platforms creates a vast and complex threat landscape that is difficult to contain. Staying on top of these evolving cyber-threats has become an increasingly difficult task that presently relies heavily on collecting and utilising cyber-threat intelligence before an attack (or at least shortly after, to minimize the damage) and entails the collection, analysis, leveraging and sharing of huge volumes of data. In this work, we put forward inTIME, a machine learning-based integrated framework that provides an holistic view in the cyber-threat intelligence process and allows security analysts to easily identify, collect, analyse, extract, integrate, and share cyber-threat intelligence from a wide variety of online sources including clear/deep/dark web sites, forums and marketplaces, popular social networks, trusted structured sources (e.g., known security databases), or other datastore types (e.g., pastebins). inTIME is a zero-administration, open-source, integrated framework that enables security analysts and security stakeholders to (i) easily deploy a wide variety of data acquisition services (such as focused web crawlers, site scrapers, domain downloaders, social media monitors), (ii) automatically rank the collected content according to its potential to contain useful intelligence, (iii) identify and extract cyber-threat intelligence and security artifacts via automated natural language understanding processes, (iv) leverage the identified intelligence to actionable items by semi-automatic entity disambiguation, linkage and correlation, and (v) manage, share or collaborate on the stored intelligence via open standards and intuitive tools. To the best of our knowledge, this is the first solution in the literature to provide an end-to-end cyber-threat intelligence management platform that is able to support the complete threat lifecycle via an integrated, simple-to-use, yet extensible framework.

Highlights

  • To gather data from social media streams it uses the provided social platform APIs; the user is able to specify a set of social media accounts and/or a set of keywords that are of interest and the content collection mechanism will retrieve all content posted from those accounts or matching the provided keywords

  • Phrases such as “database injection vulnerability”, “brute-force attack” and “privilege escalation exploit” are all noun phrases (NPs) that can be classified as Cyber-Threat Intelligence, and we would not be able to identify them with our pre-existing infrastructure

  • Automated content ranking according to its potential usefulness with respect to cyber-threat intelligence (CTI); this approach targets to identify the most promising crawler content and is the first in the literature to view the crawling task as a two-stage process, where a crude classification is initially used to prune the crawl frontier, while a more refined approach based on the collected content is used to decide on its relevance to the task

Read more

Summary

Introduction

CTI from the clear, social, deep and dark web where threat actors collaborate, communicate and plan cyber-attacks Such an approach allows us to provide visibility to several (structured and unstructured) sources that are of preference to threat-actors or security analysts and identify timely CTI including zero-day vulnerabilities and exploits. We envisioned and designed IN TIME; an integrated framework for Threat Intelligence Mining and Extraction that encompasses key technologies for pre-reconnaissance CTI gathering, analysis, management and sharing through the use of state-of-the-art tools and technologies In this context, newly discovered data from various sources are inspected for their relevance to the task (gathering), the corresponding CTI in the form of vulnerabilities, exploits, threat actors, or cyber-crime tools is explored (analysis), discovered CTI is identified and if possible consolidated with existing CTI (management), and leveraged information is stored and shared via a vulnerability database (sharing).

Related Work
Crawler Architectures
Policy-Based Typology
Usage Typology
Information Extraction for CTI
CTI Sharing
CTI Sharing Tools and Platforms
Threat Intelligence Services
Threat Intelligence Platforms
System Architecture
Data Acquisition Module
The Crawling Submodule
The Social Media Monitoring Submodule
Feed Monitoring and Target Web Scraping Submodules
Data Analysis Module
Data Management and Sharing Module
Implementation Aspects
C YBER -T RUST Case Study
Data Acquisition
Crawling
Social Media Monitoring
Feed Monitoring
Targeted Web Scraping
Data Analysis
The Content Ranking Submodule
The CTI Extraction Submodule
Data Management and Sharing
Experimental Evaluation
Evaluation of the Topical Crawler’s Classification Model
Twitter Classifier Comparison
Data Management and Sharing Insights
Conclusions and Outlook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.