Mobile-centric AI applications have high requirements for the resource-efficiency of model inference. Input filtering is a promising approach to eliminate redundancy so as to reduce the cost of inference. Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">theoretical filterability of an inference workload</i> to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained mobile applications; (2) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">robust discriminability of feature embedding</i> to allow input filtering to be widely effective for diverse inference tasks and input content. To answer them, we first formulate the input filtering problem and theoretically compare the hypothesis complexity of inference models and input filters to understand the optimization potential. Then we propose the first end-to-end learnable input filtering framework that covers most state-of-the-art methods and surpasses them in feature embedding with robust discriminability. We design and implement <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">InFi</i> that supports different input modalities and mobile-centric deployments. Comprehensive evaluations confirm our theoretical results and show that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">InFi</i> outperforms strong baselines in applicability, accuracy, and efficiency. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">InFi</i> can achieve 8.5× throughput and save 95% bandwidth, while keeping over 90% accuracy, for a video analytics application on mobile platforms.
Read full abstract