Streaming mobile augmented reality applications require both real-time recognition and tracking of objects of interest in a video sequence. Typically, local features are calculated from the gradients of a canonical patch around a keypoint in individual video frames. In this paper, we propose a temporally coherent keypoint detector and design efficient interframe predictive coding techniques for canonical patches, feature descriptors, and keypoint locations. In the proposed system, we strive to transmit each patch or its equivalent feature descriptor with as few bits as possible by modifying a previously transmitted patch or descriptor. Our solution enables server-based mobile augmented reality where a continuous stream of salient information, sufficient for image-based retrieval, and object localization, is sent at a bit-rate that is practical for today's wireless links and less than one-tenth of the bit-rate needed to stream the compressed video to the server.
Read full abstract