Past tongue-jaw movement interaction systems typically require dedicated hardware and are uncomfortable to use, limiting their scalability and generalizability. This paper introduces <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CanalScan</i> , the first system that recognizes tongue-jaw movements using commodity speakers and microphones mounted on ubiquitous off-the-shelf devices (e.g., smartphones). What inspires us is that tongue-jaw movements always cause ear canal deformations, and we find that for different tongue-jaw movements, dynamic features of ear canal deformations present unique patterns on acoustic reflections in the ear canal. Specifically, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CanalScan</i> first sends an acoustic signal to the ear canal, then parses the reflection signals for tongue-jaw movements recognition. To eliminate the impacts of body movements, we develop a body movement noise filtering method and a dynamic segmentation method to identify and separate the tongue-jaw movements-associated ear canal deformations from other types of body movements. We further propose a sensor position detection method and a data transformation mechanism to reduce the impacts of diversities in-ear canal shapes and relative positions between sensors and the ear canal. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CanalScan</i> explores twelve unique and consistent features and applies a random forest classifier to distinguish tongue-jaw movements. Extensive experiments with twenty participants validate the generalizability, effectiveness, robustness, and high accuracy of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CanalScan</i> .