We propose a new architecture for on-demand media streaming centered around the peer-to-peer (P2P) paradigm. The key idea of the architecture is that peers share some of their resources with the system. As peers contribute resources to the system, the overall system capacity increases and more clients can be served. The proposed architecture employs several novel techniques to: (1) use the often-underutilized peers’ resources, which makes the proposed architecture both deployable and cost-effective, (2) aggregate contributions from multiple peers to serve a requesting peer so that supplying peers are not overloaded, (3) make a good use of peer heterogeneity by assigning relatively more work to the powerful peers, and (4) organize peers in a network-aware fashion, such that nearby peers are grouped into a logical entity called a cluster. The network-aware peer organization is validated by statistics collected and analyzed from real Internet data. The main benefit of the network-aware peer organization is that it allows to develop efficient searching (to locate nearby suppliers) and dispersion (to disseminate new files into the system) algorithms. We present network-aware searching and dispersion algorithms that result in: (i) fast dissemination of new media files, (ii) reduction of the load on the underlying network, and (iii) better streaming service. We demonstrate the potential of the proposed architecture for a large-scale on-demand media streaming service through an extensive simulation study on large, Internet-like, topologies. Starting with a limited streaming capacity (hence, low cost), the simulation shows that the capacity rapidly increases and many clients can be served. This occurs for all studied arrival patterns, including constant rate arrivals, flash crowd arrivals, and Poisson arrivals. Furthermore, the simulation shows that a reasonable client-side initial buffering of 10–20 s is sufficient to ensure full quality playback even in the presence of peer failures.