Abstract
Conventional approaches, such as static load testing and synthetic monitoring, typically evaluate system performance under controlled conditions but do not fully capture the unpredictable scenarios encountered in real-world operations. For instance, static load testing involves applying a predetermined load to the system to measure performance metrics like response time and throughput, which may not reflect the variability and chaos of actual usage. Similarly, synthetic monitoring uses scripted transactions to check system availability and performance, but these scripts often lack the complexity and variability of real-world interactions. This research aims to overcome these limitations by utilizing advanced chaos engineering techniques to simulate a range of faults, including network latency, service crashes, resource exhaustion, message loss, and security attacks. The proposed tool integrates components for data generation, fault injection, storage, monitoring, and visualization, allowing for a thorough evaluation of system robustness. The methodology involves conducting controlled experiments within an AWS-based cloud-native IoT environment to assess the tool’s effectiveness. These experiments demonstrate that the tool effectively identifies weaknesses in system resilience and improves overall robustness. By replicating real-world disruptions and analyzing system responses, the tool provides critical insights into the behavior of IoT devices under stress. The study concludes that this chaos engineering tool significantly enhances the ability to detect and address vulnerabilities, supporting creating more resilient IoT systems. Future work will expand the range of simulated faults, validate the tool across various cloud platforms, and incorporate additional real-time analysis features.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have