A framework for the clustering and categorization of CISA reports

Ireneusz Czarnowski

doi:10.1016/j.procs.2022.09.500

Ireneusz Czarnowski

Open Access

https://doi.org/10.1016/j.procs.2022.09.500

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2022
Citations: 1	License type: cc-by-nc-nd

Affiliation: Gdynia Maritime University

Abstract

This paper presents a framework for text clustering and categorisation. The proposed clustering approach is based on a modified existing similarity-based clustering algorithm, which was originally developed for well-structured data. In this study, the clustering algorithm is used to map text documents into clusters, in order to discover groups of topical documents. The clusters produced in this way are also used for the categorisation of new documents that are uploaded to the system. The algorithms are discussed using as an example the analysis of text documents including Industrial Control Systems (ICS) Advisory Reports and Common Vulnerabilities and Exposures (CVE) recommendations, together available and provided by the Cybersecurity and Infrastructure Security Agency (CISA). Experiments are carried out, although the main focus is on the clustering algorithm. Based on the experimental results, it can be concluded that the proposed similarity-based clustering algorithm can be considered as an alternative approach for text clustering.

Full Text