Many real-time applications in consumer electronics rely on stream band join as a fundamental operation. With two streams, the band join operation targets at obtaining the pairs of tuples which are separately included in the two streams and have close values within a user specified range. Range partitioning keeps tuples with close values in a partition. For band join, the cost of employing range partitioning is less join cost than that of employing other partitioning strategies. However, the distribution of real-world data from consumer electronics over the range is skewed, causing severe load imbalance among instances in the distributed system that employs existing static range partitioning. Load migration can alleviate the load imbalance. For range partitioning, the migration has two kinds of objectives. Migrating load to the instance with adjacent partition controls the number of partitions. However, it causes an unacceptable high cost for migrating span multiple instances. While directly migrating a split partition to the lightest instance is low cost. However, it leads to an uncontrollable number of partitions. The system for consumer electronic applications cannot tolerate high migration cost and an uncontrollable number of partitions, which result in high latency and low throughput. In this work, we propose an adaptive range partitioning strategy to ensure a controllable number of partitions and load balancing with low cost. We implement Nereus, a distributed stream band join system. Nereus designs a migration benefit model using queuing theory measure, which integrates the benefits of partition’s change and load balancing. Such a design can obtain the most beneficial migration, which achieves low migration cost and an appropriate number of partitions. We conduct comprehensive experiments using large-scale datasets from real-world applications to evaluate this design. The results show that Nereus improves the throughput by 51% and reduces the processing latency by 99%, compared to existing designs.
Read full abstract