InTIME: A Machine Learning-Based Framework for Gathering and Leveraging Web Data to Cyber-Threat Intelligence

Paris Koloveas,Sofia Alevizopoulou,Spiros Skiadopoulos,Thanasis Chantzios,Christos Tryfonopoulos

doi:10.3390/electronics10070818

Paris Koloveas, Sofia Alevizopoulou + Show 3 more

Open Access

PDF Available

https://doi.org/10.3390/electronics10070818

Copy DOI

Export

Save

Cite

Journal: Electronics	Publication Date: Mar 30, 2021
Citations: 44	License type: CC BY 4.0

Affiliation: University of Peloponnese

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In today’s world, technology has become deep-rooted and more accessible than ever over a plethora of different devices and platforms, ranging from company servers and commodity PCs to mobile phones and wearables, interconnecting a wide range of stakeholders such as households, organizations and critical infrastructures. The sheer volume and variety of the different operating systems, the device particularities, the various usage domains and the accessibility-ready nature of the platforms creates a vast and complex threat landscape that is difficult to contain. Staying on top of these evolving cyber-threats has become an increasingly difficult task that presently relies heavily on collecting and utilising cyber-threat intelligence before an attack (or at least shortly after, to minimize the damage) and entails the collection, analysis, leveraging and sharing of huge volumes of data. In this work, we put forward inTIME, a machine learning-based integrated framework that provides an holistic view in the cyber-threat intelligence process and allows security analysts to easily identify, collect, analyse, extract, integrate, and share cyber-threat intelligence from a wide variety of online sources including clear/deep/dark web sites, forums and marketplaces, popular social networks, trusted structured sources (e.g., known security databases), or other datastore types (e.g., pastebins). inTIME is a zero-administration, open-source, integrated framework that enables security analysts and security stakeholders to (i) easily deploy a wide variety of data acquisition services (such as focused web crawlers, site scrapers, domain downloaders, social media monitors), (ii) automatically rank the collected content according to its potential to contain useful intelligence, (iii) identify and extract cyber-threat intelligence and security artifacts via automated natural language understanding processes, (iv) leverage the identified intelligence to actionable items by semi-automatic entity disambiguation, linkage and correlation, and (v) manage, share or collaborate on the stored intelligence via open standards and intuitive tools. To the best of our knowledge, this is the first solution in the literature to provide an end-to-end cyber-threat intelligence management platform that is able to support the complete threat lifecycle via an integrated, simple-to-use, yet extensible framework.

Highlights

To gather data from social media streams it uses the provided social platform APIs; the user is able to specify a set of social media accounts and/or a set of keywords that are of interest and the content collection mechanism will retrieve all content posted from those accounts or matching the provided keywords
Phrases such as “database injection vulnerability”, “brute-force attack” and “privilege escalation exploit” are all noun phrases (NPs) that can be classified as Cyber-Threat Intelligence, and we would not be able to identify them with our pre-existing infrastructure
Automated content ranking according to its potential usefulness with respect to cyber-threat intelligence (CTI); this approach targets to identify the most promising crawler content and is the first in the literature to view the crawling task as a two-stage process, where a crude classification is initially used to prune the crawl frontier, while a more refined approach based on the collected content is used to decide on its relevance to the task

Summary

Introduction

CTI from the clear, social, deep and dark web where threat actors collaborate, communicate and plan cyber-attacks Such an approach allows us to provide visibility to several (structured and unstructured) sources that are of preference to threat-actors or security analysts and identify timely CTI including zero-day vulnerabilities and exploits. We envisioned and designed IN TIME; an integrated framework for Threat Intelligence Mining and Extraction that encompasses key technologies for pre-reconnaissance CTI gathering, analysis, management and sharing through the use of state-of-the-art tools and technologies In this context, newly discovered data from various sources are inspected for their relevance to the task (gathering), the corresponding CTI in the form of vulnerabilities, exploits, threat actors, or cyber-crime tools is explored (analysis), discovered CTI is identified and if possible consolidated with existing CTI (management), and leveraged information is stored and shared via a vulnerability database (sharing).

Related Work

Crawler Architectures

Policy-Based Typology

Usage Typology

Information Extraction for CTI

CTI Sharing

CTI Sharing Tools and Platforms

Threat Intelligence Services

Threat Intelligence Platforms

System Architecture

Data Acquisition Module

The Crawling Submodule

The Social Media Monitoring Submodule

Feed Monitoring and Target Web Scraping Submodules

Data Analysis Module

Data Management and Sharing Module

Implementation Aspects

C YBER -T RUST Case Study

Data Acquisition

Crawling

Social Media Monitoring

Feed Monitoring

Targeted Web Scraping

Data Analysis

The Content Ranking Submodule

The CTI Extraction Submodule

Data Management and Sharing

Experimental Evaluation

Evaluation of the Topical Crawler’s Classification Model

Twitter Classifier Comparison

Data Management and Sharing Insights

Conclusions and Outlook

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

InTIME: A Machine Learning-Based Framework for Gathering and Leveraging Web Data to Cyber-Threat Intelligence

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Cyber Threat Intelligence for Improving Cybersecurity and Risk Management in Critical Infrastructure
...
Journal of Universal Computer Science | VOL. 25
, et. al. ...
28 Nov 2019
Journal of Universal Computer Science | VOL. 25

Automated Cyber Threat Intelligence Reports Classification for Early Warning of Cyber Attacks in Next Generation SOC
Wenzhuo Yang ... Kwok-Yan Lam
-
Wenzhuo Yang, et. al.Wenzhuo Yang ... Kwok-Yan Lam
01 Jan 2020
01 Jan 2020

Towards an Automated Dissemination Process of Cyber Threat Intelligence Data using STIX
Obrina Candra Briliyant ... Nusranto Pratama Tirsa
-
Obrina Candra Briliyant, et. al.Obrina Candra Briliyant ... Nusranto Pratama Tirsa
23 Oct 2021
23 Oct 2021

TIMFuser: A multi-granular fusion framework for cyber threat intelligence
Chunyan Ma ... Huamin Feng
Computers & Security | VOL. 148
Chunyan Ma, et. al.Chunyan Ma ... Huamin Feng
04 Oct 2024
Computers & Security | VOL. 148

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

InTIME: A Machine Learning-Based Framework for Gathering and Leveraging Web Data to Cyber-Threat Intelligence

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Electronics