In this paper, we presented a novel building recognition method based on a sparse representation of spatial texture and color features. At present, the most popular methods are based on gist features, which can only roughly reflect the spatial information of building images. The proposed method, in contrast, uses multi-scale neighborhood sensitive histograms of oriented gradient (MNSHOGs) and color auto-correlogram (CA) to extract texture and color features of building images. Both the MNSHOG and the CA take spatial information of building images into account while calculating texture and color features. Then, color and texture features are combined to form joint features whose sparse representation can be dimensionally reduced by an autoencoder. Finally, an extreme learning machine is used to classify the combined features after dimensionality reduction into different classes. Several experiments were conducted on the benchmark Sheffield building dataset. The mean recognition rate of our method is much higher than that of the existing methods, which shows the effectiveness of our method.