The following is an abstract of the article. The article presents an analysis of the challenges associated with monitoring and managing the scalability of a cloud application. To this end, a module for monitoring and managing the scalability of a cloud application has been developed as part of this study. The development process included the introduction of automatic scaling, and monitoring using Prometheus and Grafana, which allows for a high level of availability and resource efficiency. The study comprised a series of phases, including requirements analysis, system design, development, testing, and evaluation. Consequently, the system's performance, stability, and capacity to scale in response to fluctuating workloads were enhanced. The module exhibits a high degree of adaptability to changes in system requirements and load, which is a crucial attribute for the dynamic development of business applications. This solution assists in optimizing the allocation of resources and reducing infrastructure costs. The project has been found to fully meet the set goals and objectives, as well as the requirements for effective resource management of the Amazon Web Services cloud platform using Terraform, Prometheus, and Grafana. The practical value of the developed module is evidenced by a significant improvement in resource efficiency, service stability and cost optimisation. The module design has been subjected to rigorous testing and has been successfully implemented in a test environment, thereby demonstrating the sustainability and efficiency of the developed solution. The experience gained in the implementation and operation of this solution may prove useful for further expansion and optimization of cloud solutions in other projects and companies specializing in the provision of cloud solutions. The findings of this study were validated in a test environment at an IT company with a specialization in cloud technologies. The objective was to ascertain the functionality and efficiency of the developed module in a real-world context of cloud infrastructure operation. The testing process entailed the configuration of the module on pre-existing cloud infrastructure systems, its integration with Prometheus and Grafana for monitoring purposes, and the execution of a series of stress tests designed to assess the module's scalability. As a result of this testing, a number of critical points were identified that required further optimization. The results of the study and the issues identified during the project testing have enabled the identification of several areas for further improvement and development of the system. First and foremost, the optimization of automatic scaling algorithms represents a crucial avenue for improvement. The development of these algorithms should be oriented towards utilizing historical monitoring data to anticipate potential shifts in system load. Another pivotal area for enhancement is the precision of monitoring systems. The integration of supplementary tools and the expansion of existing monitoring systems' functionality will facilitate the acquisition of more comprehensive insights into the system's condition. This, in turn, will facilitate the expedient identification and eradication of potential issues.
Read full abstract