Compared with such high-precision and dense representation maps as the point cloud or occupancy grid maps, the landmark map has the advantages of compactness and memory efficiency, where these advantages are particularly prominent in a large-scale environment for robot localization, navigation, or environment measurement. However, training a robot to identify and select useful landmarks for localization is challenging. Due to this limitation, most landmarks are identified using handcrafted features. In this article, we propose a multitask neural network called LandmarkNet to build a flexible, optimal, and interpretable landmark map. The multitask neural network with an attention mechanism is trained for the robot’s pose regression and the semantic segmentation of images from a set of sequences of monocular images. The image segmentation yields auxiliary semantic information for landmark selection based on some common-sense notions such as that the sky and clouds cannot be used as landmarks for robot localization. The attention feature map thus learned is used to select landmarks for robot localization based on the optimal projection between the image sequence with the corresponding pose of the robot and the semantic segmentation of the images. The learned landmark maps are also applied to mobile robot localization using traditional Monte Carlo localization (MCL). To verify the validity of our methods, we performed experiments on simulation, an open dataset (KITTI monoVO), and a dataset that we had created on the campus of Tongji University. The results show that the learned landmark map, which was only 6% of the size of the original map, can be used to accurately perform visual localization tasks. We open the source code on website https://github.com/dotstudio01/landmark_learning.git .