Multimedia researchers have exploited large collections of community-contributed geo-referenced images to better understand a particular image, such as its subject matter or where it was taken, as well as to better understand a geographic location, such as the most visited tourist spots in a city or what the local cuisine is like. The goal of this paper is to better understand location. In particular, we use geo-referenced image collections to better understand what occurs in different parts of a city at fine spatial and activity class scales. This problem is known as land use mapping in the geographical sciences. We propose a novel framework to perform fine-grained land use mapping at the city scale using ground-level images. Mapping land use is considerably more difficult than mapping land cover and is generally not possible using overhead imagery as it requires close-up views and seeing inside buildings. We postulate that the growing collections of geo-referenced, ground-level images suggest an alternate approach to this geographic knowledge discovery problem. We develop a general framework that uses Flickr images to map 45 different land-use classes for the city of San Francisco, CA, USA. Individual images are classified using a novel convolutional neural network containing two streams: one for recognizing objects and another for recognizing scenes. This network is trained in an end-to-end manner directly on the labeled training images. We propose several novel strategies to overcome the noisiness of our user-generated data including search-based training set augmentation and online adaptive training. We derive a ground truth map of San Francisco in order to evaluate our method. We demonstrate the effectiveness of our approach through geovisualization and quantitative analysis. Our framework achieves over 29% recall at the individual land parcel level that represents a strong baseline for the challenging 45-way land use classification problem, especially given the noisiness of the image data.