Due to advances in satellite and sensor technology, the number and size of Remote Sensing (RS) images continue to grow at a rapid pace. The continuous stream of sensor data from satellites poses major challenges for the retrieval of relevant information from those satellite datastreams. The Bag-of-Words (BoW) framework is a leading image search approach and has been successfully applied in a broad range of computer vision problems and hence has received much attention from the RS community. However, the recognition performance of a typical BoW framework becomes very poor when the framework is applied to application scenarios where the appearance and texture of images are very similar. In this paper, we propose a simple method to improve recognition performance of a typical BoW framework by representing images with local features extracted from base images. In addition, we propose a similarity measure for RS images by counting the number of same words assigned to images. We compare the performance of these methods with a typical BoW framework. Our experiments show that the proposed method has better recognition performance than that of the BoW and requires less storage space for saving local invariant features.