Abstract

This paper presents a framework for text clustering and categorisation. The proposed clustering approach is based on a modified existing similarity-based clustering algorithm, which was originally developed for well-structured data. In this study, the clustering algorithm is used to map text documents into clusters, in order to discover groups of topical documents. The clusters produced in this way are also used for the categorisation of new documents that are uploaded to the system. The algorithms are discussed using as an example the analysis of text documents including Industrial Control Systems (ICS) Advisory Reports and Common Vulnerabilities and Exposures (CVE) recommendations, together available and provided by the Cybersecurity and Infrastructure Security Agency (CISA). Experiments are carried out, although the main focus is on the clustering algorithm. Based on the experimental results, it can be concluded that the proposed similarity-based clustering algorithm can be considered as an alternative approach for text clustering.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call