Augmented reality (AR) technologies enhance users knowledge of their immediate surroundings by presenting contextualized and spatially relevant information. This augmentation is enabled by the automatic estimation of the device's pose with respect to the environment. Many simple AR applications for smartphones are well served by the pose estimate provided by GPS and inertial sensors, while other demand an higher accuracy which can be obtained with vision-based methods. The application presented in this paper, which aims to augment pictures of mountainous landscapes with geo-referenced data, belongs to the latter category. Our application is based on a novel approach for image-to-world registration which jointly relies on inertial and visual sensors. In a nutshell, first GPS and inertial sensors are used to compute a rough estimate of the device position and pose, then visual data are employed to refine it. Specifically, a learning-based edge detection algorithm is used to extract mountain profiles from the picture of interest. Then, a novel registration algorithm based on a robust optimization framework is employed to align the computed profiles to synthetic ones obtained from Digital Elevation Models. Our experiments, conducted on a novel dataset of manually aligned pictures which is made publicly available, demonstrate that the proposed registration method guarantees an increased accuracy with respect to competing methods and it is computationally efficient when implemented on a smartphone.