Tail Latency Research Articles

We propose ecoTLB —software-based eventual translation lookaside buffer (TLB) coherence—which eliminates the overhead of the synchronous TLB shootdown mechanism in operating systems that use address space identifiers (ASIDs). With an eventual TLB coherence, ecoTLB improves the performance of free and page swap operations by removing the inter-processor interrupt (IPI) overheads incurred to invalidate TLB entries. We show that the TLB shootdown has implications for page swapping in particular in emerging, disaggregated data centers and demonstrate that ecoTLB can improve both the performance and the specific swapping policy decisions using ecoTLB ’s asynchronous mechanism. We demonstrate that ecoTLB improves the performance of real-world applications, such as Memcached and Make, that perform page swapping using Infiniswap , a solution for next generation data centers that use disaggregated memory, by up to 17.2%. Moreover, ecoTLB improves the 99th percentile tail latency of Memcached by up to 70.8% due to its asynchronous scheme and improved policy decisions. Furthermore, we show that recent features to improve security in the Linux kernel, like kernel page table isolation (KPTI), can result in significant performance overheads on architectures without support for specific instructions to clear single entries in tagged TLBs, falling back to full TLB flushes. In this scenario, ecoTLB is able to recover the performance lost for supporting KPTI due to its asynchronous shootdown scheme and its support for tagged TLBs. Finally, we demonstrate that ecoTLB improves the performance of free operations by up to 59.1% on a 120-core machine and improves the performance of Apache on a 16-core machine by up to 13.7% compared to baseline Linux, and by up to 48.2% compared to ABIS, a recent state-of-the-art research prototype that reduces the number of IPIs.

Read full abstract

As both the availability of internet access and the prominence of smart devices continue to increase, data is being generated at a rate faster than ever before. This massive increase in data production comes with many challenges, including efficiency concerns for the storage and retrieval of such large-scale data. However, users have grown to expect the sub-second response times that are common in most modern search engines, creating a problem --- how can such large amounts of data continue to be served efficiently enough to satisfy end users? This dissertation investigates several issues regarding tail latency in large-scale information retrieval systems. Tail latency corresponds to the high percentile latency that is observed from a system --- in the case of search, this latency typically corresponds to how long it takes for a query to be processed. In particular, keeping tail latency as low as possible translates to a good experience for all users, as tail latency is directly related to the worst-case latency and hence, the worst possible user experience. The key idea in targeting tail latency is to move from questions such as "what is the median latency of our search engine?" to questions which more accurately capture user experience such as "how many queries take more than 200 ms to return answers?" or "what is the worst case latency that a user may be subject to, and how often might it occur?" While various strategies exist for efficiently processing queries over large textual corpora, prior research has focused almost entirely on improvements to the average processing time or cost of search systems. As a first contribution, we examine some state-of-the-art retrieval algorithms for two popular index organizations, and discuss the trade-offs between them, paying special attention to the notion of tail latency. This research uncovers a number of observations that are subsequently leveraged for improved search efficiency and effectiveness. We then propose and solve a new problem, which involves processing a number of related query variations together, known as multi-queries , to yield higher quality search results. We experiment with a number of algorithmic approaches to efficiently process these multi-queries, and report on the cost, efficiency, and effectiveness trade-offs present with each. Finally, we examine how predictive models can be used to improve the tail latency and end-to-end cost of a commonly used multi-stage retrieval architecture without impacting result effectiveness. By combining ideas from numerous areas of information retrieval, we propose a prediction framework which can be used for training and evaluating several efficiency/effectiveness trade-off parameters, resulting in improved trade-offs between cost, result quality, and tail latency.

Read full abstract

Tail Latency Research Articles

Related Topics

Articles published on Tail Latency

Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators

Probability distribution based resource management for multitenant cloud clusters

HTPC: heterogeneous traffic-aware partition coding for random packet spraying in data center networks

Delay Analysis in IoT Sensor Networks.

Design of LSM-tree-based Key-value SSDs with Bounded Tails

HCMonitor: An accurate measurement system for high concurrent network services

BabelFish: Fusing Address Translations for Containers

Analysis of the K2 Scheduler for a Real-Time System with an SSD

Tail Latency Optimization for LDPC-Based High-Density and Low-Cost Flash Memory Devices

SPARE: Partial Replication for Multi-Tier Applications in the Cloud

Alleviating I/O Interference in Virtualized Systems With VM-Aware Persistency Control

Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory

InK: In-Kernel Key-Value Storage with Persistent Memory

ECO TLB

Less Provisioning: A Hybrid Resource Scaling Engine for Long-Running Services With Tail Latency Guarantees

UTree

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

HMB-I/O: Fast Track for Handling Urgent I/Os in Nonvolatile Memory Express Solid-State Drives

TTLCache: Taming Latency in Erasure-Coded Storage Through TTL Caching

Managing tail latency in large scale information retrieval systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Tail Latency Research Articles

Related Topics

Articles published on Tail Latency

Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators

Probability distribution based resource management for multitenant cloud clusters

HTPC: heterogeneous traffic-aware partition coding for random packet spraying in data center networks

Delay Analysis in IoT Sensor Networks.

Design of LSM-tree-based Key-value SSDs with Bounded Tails

HCMonitor: An accurate measurement system for high concurrent network services

BabelFish: Fusing Address Translations for Containers

Analysis of the K2 Scheduler for a Real-Time System with an SSD

Tail Latency Optimization for LDPC-Based High-Density and Low-Cost Flash Memory Devices

SPARE: Partial Replication for Multi-Tier Applications in the Cloud

Alleviating I/O Interference in Virtualized Systems With VM-Aware Persistency Control

Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory

InK: In-Kernel Key-Value Storage with Persistent Memory

ECO TLB

Less Provisioning: A Hybrid Resource Scaling Engine for Long-Running Services With Tail Latency Guarantees

UTree

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

HMB-I/O: Fast Track for Handling Urgent I/Os in Nonvolatile Memory Express Solid-State Drives

TTLCache: Taming Latency in Erasure-Coded Storage Through TTL Caching

Managing tail latency in large scale information retrieval systems