Points-of-interests (POIs) have been proven to be indicative for sensing urban land use in numerous studies. However, recent progress mainly relies on spatial co-occurrence patterns among POI categories, which falls short in utilizing the rich semantic information embodied in POI hierarchical categories and in sensing the spatial distribution patterns of POIs at an individual zonal scale. In this context, we present a spatial and adversarial representation learning approach (SARL) for predicting land use of urban zones with POIs. SARL deeply mines the information from POIs from both spatial and categorical perspectives. Specifically, we first utilize a convolutional neural network to sense the spatial distribution patterns of POIs in each urban zone. We then leverage an autoencoder and an adversarial learning strategy to mine the POI categorical information in all hierarchical levels, which emphasizes the prominent and definitive POIs while preserves the overall POI hierarchical structures in each zone. Finally, we fuse these information from the two perspectives via a Wide & Deep network and carry out land use prediction with the fused embeddings. We conduct comprehensive experiments to validate the effectiveness of SARL in four European cities with real-world data. The results demonstrate that SARL substantially outperforms several competitive baselines.