The aim of this study is to develop a universal software module for semantic analysis of open-source intelligence data using machine learning methods, enabling the automation of data collection, processing, and clustering of large datasets. Special emphasis is placed on improving analysis accuracy and simplifying result interpretation for users from various fields, such as cybersecurity, business analytics, and journalism. The study employs modern natural language processing methods, including Term Frequency-Inverse Document Frequency, Bidirectional Encoder Representations from Transformers for semantic text analysis, and the Density-Based Spatial Clustering of Applications with Noise algorithm for data grouping. Data collection was carried out using social media APIs, news portals, and automated web scraping tools. The module’s performance was evaluated using precision, recall, and F1-score metrics. For the first time, this study proposes the integration of deep learning models, namely Bidirectional Encoder Representations from Transformers, with clustering algorithms to address open-source intelligence tasks. The system provides flexibility in adapting to different data sources and multilingual processing. A significant achievement is the improvement of semantic analysis by considering query context, which ensures higher result relevance. The developed module can be applied in cybersecurity for threat detection, in business for competitor analysis and market monitoring, and in journalism for fact-checking, research, fake news detection, and analysis of information campaigns. The integration of multiple data sources enables a comprehensive approach to information analysis. The tested module demonstrated high efficiency, achieving a clustering accuracy of over 90 % and the capability to process large datasets (up to 10,000 documents in 3 minutes). The system successfully identifies key entities, automatically clusters data, and provides visualization in a user-friendly format.
Read full abstract- All Solutions
Editage
One platform for all researcher needs
Paperpal
AI-powered academic writing assistant
R Discovery
Your #1 AI companion for literature search
Mind the Graph
AI tool for graphics, illustrations, and artwork
Journal finder
AI-powered journal recommender
Unlock unlimited use of all AI tools with the Editage Plus membership.
Explore Editage Plus - Support
Overview
43775 Articles
Published in last 50 years
Articles published on Machine Learning Methods
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
42092 Search results
Sort by Recency