Ultrasound is a promising medical imaging modality benefiting from low-cost and real-time acquisition. Accurate tracking of an anatomical landmark has been of high interest for various clinical workflows such as minimally invasive surgery and ultrasound-guided radiation therapy. However, tracking an anatomical landmark accurately in ultrasound video is very challenging, due to landmark deformation, visual ambiguity and partial observation. In this paper, we propose a long-short diffeomorphism memory network (LSDM), which is a multi-task framework with an auxiliary learnable deformation prior to supporting accurate landmark tracking. Specifically, we design a novel diffeomorphic representation, which contains both long and short temporal information stored in separate memory banks for delineating motion margins and reducing cumulative errors. We further propose an expectation maximization memory alignment (EMMA) algorithm to iteratively optimize both the long and short deformation memory, updating the memory queue for mitigating local anatomical ambiguity. The proposed multi-task system can be trained in a weakly-supervised manner, which only requires few landmark annotations for tracking and zero annotation for deformation learning. We conduct extensive experiments on both public and private ultrasound landmark tracking datasets. Experimental results show that LSDM can achieve better or competitive landmark tracking performance with a strong generalization capability across different scanner types and different ultrasound modalities, compared with other state-of-the-art methods.