With the increasing demand for location‐based services such as railway stations, airports, and shopping malls, indoor positioning technology has become one of the most attractive research areas. Due to the effects of multipath propagation, wireless‐based indoor localization methods such as WiFi, bluetooth, and pseudolite have difficulty achieving high precision position. In this work, we present an image‐based localization approach which can get the position just by taking a picture of the surrounding environment. This paper proposes a novel approach which classifies different scenes based on deep belief networks and solves the camera position with several spatial reference points extracted from depth images by the perspective‐n‐point algorithm. To evaluate the performance, experiments are conducted on public data and real scenes; the result demonstrates that our approach can achieve submeter positioning accuracy. Compared with other methods, image‐based indoor localization methods do not require infrastructure and have a wide range of applications that include self‐driving, robot navigation, and augmented reality.