We consider a wireless device-to-device caching network where $n$ nodes are placed on a regular grid of area $A\left ({n}\right)$ . Each node caches $L_{C}F$ (coded) bits from a library of size $LF$ bits, where $L$ is the number of files and $F$ is the size of each file. Each node requests a file from the library independently according to a popularity distribution. Under a commonly used “physical model” and Zipf popularity distribution, we characterize the optimal per-node capacity scaling law for extended networks (i.e., $A\left ({n}\right)=n$ ). Moreover, we propose a cache-induced hierarchical cooperation scheme and associated cache content placement optimization algorithm to achieve the optimal per-node capacity scaling law. When the path loss exponent $\alpha , the optimal per-node capacity scaling law achieved by the cache-induced hierarchical cooperation can be significantly better than that achieved by the existing state-of-the-art schemes. To the best of our knowledge, this is the first work that completely characterizes the per-node capacity scaling law for wireless caching networks under the physical model and Zipf distribution with an arbitrary skewness parameter $\tau $ . While scaling law analysis yields clean results, it may not accurately reflect the throughput performance of a large network with a finite number of nodes. Therefore, we also analyze the throughput of the proposed cache-induced hierarchical cooperation for networks of practical size. The analysis and simulations verify that cache-induced hierarchical cooperation can also achieve a large throughput gain over the cache-assisted multihop scheme for networks of practical size.