Abstract

The subject of this research is the key methods for creating the architecture of information aggregators, methods for increasing scalability and effectiveness of such systems, methods for reducing the delay between the publication of new content by the source and emergence of its copy in the information aggregator. In this research, the content aggregator implies the distributed high-load information system that automatically collects information from various sources, process and displays it on a special website or mobile application. Particular attention is given to the basic principles of content aggregation: key stages of aggregation and criteria for data sampling, automation of aggregation processes, content copy strategies, and content aggregation approaches. The author's contribution consists in providing detailed description of web crawling and fuzzy duplicate detection systems. The main research result lies in the development of high-level architecture of the content aggregation system. Recommendations are given on the selection of the architecture of styles and special software regime that allows creating the systems for managing distributed databases and message brokers. The presented architecture aims to provide high availability, scalability for high query volumes, and big data performance. To increase the performance of the proposed system, various caching methods, load balancers, and message queues should be actively used. For storage of the content aggregation system, replication and partitioning must be used to improve availability, latency, and scalability. In terms of architectural styles, microservice architecture, event-driven architecture, and service-based architecture are the most preferred architectural approaches for such system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.