In the field of cross-view image geolocation, traditional convolutional neural network (CNN)-based learning models generate unsatisfactory fusion performance due to their inability to model global correlations. The Transformer-based fusion methods can well compensate for the above problems, however, the Transformer has quadratic computational complexity and huge GPU memory consumption. The recent Mamba model based on the selection state space has a strong ability to model long sequences, lower GPU memory occupancy, and fewer GFLOPs. It is thus attractive and worth studying to apply Mamba to the cross-view image geolocation task. In addition, in the image-matching process (i.e., fusion of satellite/aerial and street view data.), we found that the storage occupancy of similarity measures based on floating-point features is high. Efficiently converting floating-point features into hash codes is a possible solution. In this study, we propose a cross-view image geolocation method (S6HG) based purely on Vision Mamba and hashing. S6HG fully utilizes the advantages of Vision Mamba in global information modeling and explicit location information encoding and the low storage occupancy of hash codes. Our method consists of two stages. In the first stage, we use a Siamese network based purely on vision Mamba to embed features for street view images and satellite images respectively. Our first-stage model is called S6G. In the second stage, we construct a cross-view autoencoder to further refine and compress the embedded features, and then simply map the refined features to hash codes. Comprehensive experiments show that S6G has achieved superior results on the CVACT dataset and comparable results to the most advanced methods on the CVUSA dataset. It is worth noting that other floating-point feature-based methods (4096-dimension) are 170.59 times faster than S6HG (768-bit) in storing 90,618 retrieval gallery data. Furthermore, the inference efficiency of S6G is higher than ViT-based computational methods.