As parallel computing has become increasingly common, the need for scalable and efficient ways of storing and locating data has become increasingly acute. For years, both grid and cloud computing have distributed data across machines and even clusters at different geographic locations (sites). However not all sites need all of the data in a particular data set, or have the (perhaps specialized) processing capabilities required. These facts challenge the conventional wisdom that we should always move the computation to the data rather than the data to the computation. Sometimes the data actually required is small. In other cases, the site with specialized processing capabilities (such as a GPU equipped cluster) cannot handle the demands placed on it unless a way is found to let that cluster select the data that is actually needed, even if it is not stored locally.