Recently, as fitness has become a popular part of people’s lives, the intention to record fitness processes and assess the standards of fitness movements has grown increasingly keen. However, the existing approaches have some limitations, for example, wearable devices can hinder users’ fitness activities; computer vision–based solutions pose the risk of privacy breach, and so on. Fortunately, we observed that smartspeaker, acoustic-based sensing is a promising method of activity monitoring. In this article, we propose Afitness, an acoustic-based sensing system that enables non-intrusive, passive, and high-precision fitness detection. Afitness has the following three innovations. (i) We utilize pulse compression to generate high-precision motion distance images on commercial devices that can be visually recognized. (ii) We propose a data augmentation algorithm, which also incorporates transfer learning to greatly reduce the pressure of data collection. (iii) We exploit incremental learning techniques that allow Afitness to improve the portability of our system and recognize new actions. Overall, Afitness achieves acoustic signal interpretability and environmental reliability detection.