Accurate localization of mobile devices based on camera-acquired visual media information usually requires a search over a very large GPS-referenced image database collected from social sharing websites like Flickr or services such as Google Street View. This paper proposes a new method for reliable estimation of the actual query camera location by optimally utilizing structure from motion (SFM) for three-dimensional (3-D) camera position reconstruction, and introducing a new approach for applying a linear transformation between two different 3-D Cartesian coordinate systems. Since the success of SFM hinges on effectively selecting among the multiple retrieved images, we propose an optimization framework to do this using the criterion of the highest intraclass similarity among images returned from retrieval pipeline to increase SFM convergence rate. The selected images along with the query are then used to reconstruct a 3-D scene and find the relative camera positions by employing SFM. In the last processing step, an effective camera coordinate transformation algorithm is introduced to estimate the query's geo-tag. The influence of the number of images involved in SFM on the ultimate position error is investigated by examining the use of three and four dataset images with different solution for calculating the query world coordinates. We have evaluated our proposed method on query images with known accurate ground truth. Experimental results are presented to demonstrate that our method outperforms other reported methods in terms of average error.