In cloud-based Distributed Acoustic Sensing (DAS) sensor data management, we are confronted with two primary challenges. First, the development of efficient storage mechanisms capable of handling the enormous volume of data generated by these sensors poses a challenge. To solve this issue, we propose a method to address the issue of handling the large amount of data involved in DAS by designing and implementing a pipeline system to efficiently send the big data to DynamoDB in order to fully use the low latency of the DynamoDB data storage system for a benchmark DAS scheme for performing continuous monitoring over a 100 km range at a meter-scale spatial resolution. We employ the DynamoDB functionality of Amazon Web Services (AWS), which allows highly expandable storage capacity with latency of access of a few tens of milliseconds. The different stages of DAS data handling are performed in a pipeline, and the scheme is optimized for high overall throughput with reduced latency suitable for concurrent, real-time event extraction as well as the minimal storage of raw and intermediate data. In addition, the scalability of the DynamoDB-based data storage scheme is evaluated for linear and nonlinear variations of number of batches of access and a wide range of data sample sizes corresponding to sensing ranges of 1-110 km. The results show latencies of 40 ms per batch of access with low standard deviations of a few milliseconds, and latency per sample decreases for increasing the sample size, paving the way toward the development of scalable, cloud-based data storage services integrating additional post-processing for more precise feature extraction. The technique greatly simplifies DAS data handling in key application areas requiring continuous, large-scale measurement schemes. In addition, the processing of raw traces in a long-distance DAS for real-time monitoring requires the careful design of computational resources to guarantee requisite dynamic performance. Now, we will focus on the design of a system for the performance evaluation of cloud computing systems for diverse computations on DAS data. This system is aimed at unveiling valuable insights into performance metrics and operational efficiencies of computations on the data in the cloud, which will provide a deeper understanding of the system's performance, identify potential bottlenecks, and suggest areas for improvement. To achieve this, we employ the CloudSim framework. The analysis reveals that the virtual machine (VM) performance decreases significantly the processing time with more capable VMs, influenced by Processing Elements (PEs) and Million Instructions Per Second (MIPS). The results also reflect that, although a larger number of computations is required as the fiber length increases, with the subsequent increase in processing time, the overall speed of computation is still suitable for continuous real-time monitoring. We also see that VMs with lower performance in terms of processing speed and number of CPUs have more inconsistent processing times compared to those with higher performance, while not incurring significantly higher prices. Additionally, the impact of VM parameters on computation time is explored, highlighting the importance of resource optimization in the DAS system design for efficient performance. The study also observes a notable trend in processing time, showing a significant decrease for every additional 50,000 columns processed as the length of the fiber increases. This finding underscores the efficiency gains achieved with larger computational loads, indicating improved system performance and capacity utilization as the DAS system processes more extensive datasets.
Read full abstract