Ultrasound (US)-probe motion estimation is a fundamental problem in automated standard plane locating during obstetric US diagnosis. Most recent existing recent works employ deep neural network (DNN) to regress the probe motion. However, these deep regressionbased methods leverage the DNN to overfit on the specific training data, which is naturally lack of generalization ability for the clinical application. In this paper, we are back to generalized US feature learning rather than deep parameter regression. We propose a self-supervised learned local detector and descriptor, named USPoint, for US-probe motion estimation during the fine-adjustment phase of fetal plane acquisition. Specifically, a hybrid neural architecture is designed to simultaneously extract a local feature, and further estimate the probe motion. By embedding a differentiable USPoint-based motion estimation inside the proposed network architecture, the USPoint learns the keypoint detector, scores and descriptors from motion error alone, which doesn't require expensive human-annotation of local features. The two tasks, local feature learning and motion estimation, are jointly learned in a unified framework to enable collaborative learning with the aim of mutual benefit. To the best of our knowledge, it is the first learned local detector and descriptor tailored for the US image. Experimental evaluation on real clinical data demonstrates the resultant performance improvement on feature matching and motion estimation for potential clinical value. A video demo can be found online: https://youtu.be/JGzHuTQVlBs.