As the trend to open up data and provide them freely on the Internet intensifies, the opportunities to create added value by combining and cross-indexing heterogeneous data at a large scale increase. To seize these opportunities we need infrastructure that is not only efficient, real-time responsive and scalable but is also flexible and robust enough to welcome data in any schema and form and to transparently relegate and translate queries from a unifying end-point to the multitude of data services that make up the open data cloud. Transparent relegation and translation relies on detailed and accurate data summaries and other data source annotations, and with increased data volumes and heterogeneity managing these annotations, it becomes by itself a challenging data problem. In this position paper we discuss (a) how a scalable and robust semantic storage can be developed, using indexing algorithms that can take advantage of resource naming conventions and other natural groupings of URIs to compress data source annotations about extremely large datasets; and (b) how query decomposition, source selection, and distributed querying methods can be designed, that take advantage of such algorithms to implement a scalable and robust infrastructure for data service federation.