Internet-of-Things (IoT) applications often require the use of sensor-based indoor tracking and positioning, for which the performance is significantly enhanced by identifying the type of the surrounding indoor environment. This letter develops a highly accurate and computationally efficient method for indoor localization based on a novel two-stage deep learning approach. In the first stage, a convolutional neural network (CNN) is designed to perform indoor environment identification by extracting the inherent features based on a real dataset obtained from radio-frequency (RF) measurements. Then, in the second stage, another CNN is designed to perform indoor localization taking into account knowledge of the environment type as obtained from the first stage. Numerical investigations demonstrate that the proposed two-stage CNN deep learning approach outperforms benchmark methods, with significant enhancements in localization accuracy and computational efficiency. The findings of this letter make the proposed CNN approach a key element for facilitating real-time deployment of efficient low-power IoT sensor networks.