This study introduces a novel methodology designed to assess the accuracy of data processing in the Lambda Architecture (LA), an advanced big-data framework qualified for processing streaming (data in motion) and batch (data at rest) data. Distinct from prior studies that have focused on hardware performance and scalability evaluations, our research uniquely targets the intricate aspects of data-processing accuracy within the various layers of LA. The salient contribution of this study lies in its empirical approach. For the first time, we provide empirical evidence that validates previously theoretical assertions about LA, which have remained largely unexamined due to LA's intricate design. Our methodology encompasses the evaluation of prospective technologies across all levels of LA, the examination of layer-specific design limitations, and the implementation of a uniform software development framework across multiple layers. Specifically, our methodology employs a unique set of metrics, including data latency and processing accuracy under various conditions, which serve as critical indicators of LA's accurate data-processing performance. Our findings compellingly illustrate LA's "eventual consistency". Despite potential transient inconsistencies during real-time processing in the Speed Layer (SL), the system ultimately converges to deliver precise and reliable results, as informed by the comprehensive computations of the Batch Layer (BL). This empirical validation not only confirms but also quantifies the claims posited by previous theoretical discourse, with our results indicating a 100% accuracy rate under various severe data-ingestion scenarios. We applied this methodology in a practical case study involving air/ground surveillance, a domain where data accuracy is paramount. This application demonstrates the effectiveness of the methodology using real-world data-intake scenarios, therefore distinguishing this study from hardware-centric evaluations. This study not only contributes to the existing body of knowledge on LA but also addresses a significant literature gap. By offering a novel, empirically supported methodology for testing LA, a methodology with potential applicability to other big-data architectures, this study sets a precedent for future research in this area, advancing beyond previous work that lacked empirical validation.
Read full abstract